date:20240415

Re: [PATCH v4 00/14] security: digest_cache LSM

2024-04-15 Thread Bagas Sanjaya

On Mon, Apr 15, 2024 at 04:24:22PM +0200, Roberto Sassu wrote:
> From: Roberto Sassu 
> 
> Integrity detection and protection has long been a desirable feature, to
> reach a large user base and mitigate the risk of flaws in the software
> and attacks.
> 
> However, while solutions exist, they struggle to reach the large user
> base, due to requiring higher than desired constraints on performance,
> flexibility and configurability, that only security conscious people are
> willing to accept.
> 
> This is where the new digest_cache LSM comes into play, it offers
> additional support for new and existing integrity solutions, to make
> them faster and easier to deploy.
> 
> The full documentation with the motivation and the solution details can be
> found in patch 14.
> 
> The IMA integration patch set will be introduced separately. Also a PoC
> based on the current version of IPE can be provided.
> 

I can't cleanly apply this series (conflict on patch [13/14]). Can you
point out the base commit of this series?

Confused...

-- 
An old man doll... just what I always wanted! - Clara


signature.asc
Description: PGP signature

[PATCH v2 17/17] selftests: riscv: Support xtheadvector in vector tests

2024-04-15 Thread Charlie Jenkins

Extend existing vector tests to be compatible with the xtheadvector
instruction set.

Signed-off-by: Charlie Jenkins 
---
 .../selftests/riscv/vector/v_exec_initval_nolibc.c | 23 --
 tools/testing/selftests/riscv/vector/v_helpers.c   | 24 +-
 tools/testing/selftests/riscv/vector/v_helpers.h   |  4 +-
 tools/testing/selftests/riscv/vector/v_initval.c   | 12 ++-
 .../selftests/riscv/vector/vstate_exec_nolibc.c| 20 +++--
 .../testing/selftests/riscv/vector/vstate_prctl.c  | 91 ++
 6 files changed, 122 insertions(+), 52 deletions(-)

diff --git a/tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c 
b/tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c
index 74b13806baf0..58c29ea91b80 100644
--- a/tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c
+++ b/tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c
@@ -18,13 +18,22 @@ int main(int argc, char **argv)
unsigned long vl;
int first = 1;
 
-   asm volatile (
-   ".option push\n\t"
-   ".option arch, +v\n\t"
-   "vsetvli%[vl], x0, e8, m1, ta, ma\n\t"
-   ".option pop\n\t"
-   : [vl] "=r" (vl)
-   );
+   if (argc > 2 && strcmp(argv[2], "x"))
+   asm volatile (
+   // 0 | zimm[10:0] | rs1 | 1 1 1 | rd |1010111| vsetvli
+   // vsetvli  t4, x0, e8, m1, d1
+   ".insn  0b011011010111\n\t"
+   "mv %[vl], t4\n\t"
+   : [vl] "=r" (vl) : : "t4"
+   );
+   else
+   asm volatile (
+   ".option push\n\t"
+   ".option arch, +v\n\t"
+   "vsetvli%[vl], x0, e8, m1, ta, ma\n\t"
+   ".option pop\n\t"
+   : [vl] "=r" (vl)
+   );
 
 #define CHECK_VECTOR_REGISTER(register) ({ 
\
for (int i = 0; i < vl; i++) {  
\
diff --git a/tools/testing/selftests/riscv/vector/v_helpers.c 
b/tools/testing/selftests/riscv/vector/v_helpers.c
index 15c22318db72..338ba577536d 100644
--- a/tools/testing/selftests/riscv/vector/v_helpers.c
+++ b/tools/testing/selftests/riscv/vector/v_helpers.c
@@ -1,11 +1,28 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
 #include "../hwprobe/hwprobe.h"
+#include 
 #include 
 #include 
 #include 
 #include 
 
+int is_xtheadvector_supported(void)
+{
+   struct riscv_hwprobe pair;
+
+   pair.key = RISCV_HWPROBE_KEY_MVENDORID;
+   riscv_hwprobe(, 1, 0, NULL, 0);
+
+   if (pair.value == 0x5b7) {
+   pair.key = RISCV_HWPROBE_KEY_VENDOR_EXT_0;
+   riscv_hwprobe(, 1, 0, NULL, 0);
+   return pair.value & RISCV_HWPROBE_VENDOR_EXT_XTHEADVECTOR;
+   } else {
+   return 0;
+   }
+}
+
 int is_vector_supported(void)
 {
struct riscv_hwprobe pair;
@@ -15,9 +32,9 @@ int is_vector_supported(void)
return pair.value & RISCV_HWPROBE_IMA_V;
 }
 
-int launch_test(char *next_program, int test_inherit)
+int launch_test(char *next_program, int test_inherit, int xtheadvector)
 {
-   char *exec_argv[3], *exec_envp[1];
+   char *exec_argv[4], *exec_envp[1];
int rc, pid, status;
 
pid = fork();
@@ -29,7 +46,8 @@ int launch_test(char *next_program, int test_inherit)
if (!pid) {
exec_argv[0] = next_program;
exec_argv[1] = test_inherit != 0 ? "x" : NULL;
-   exec_argv[2] = NULL;
+   exec_argv[2] = xtheadvector != 0 ? "x" : NULL;
+   exec_argv[3] = NULL;
exec_envp[0] = NULL;
/* launch the program again to check inherit */
rc = execve(next_program, exec_argv, exec_envp);
diff --git a/tools/testing/selftests/riscv/vector/v_helpers.h 
b/tools/testing/selftests/riscv/vector/v_helpers.h
index 88719c4be496..67d41cb6f871 100644
--- a/tools/testing/selftests/riscv/vector/v_helpers.h
+++ b/tools/testing/selftests/riscv/vector/v_helpers.h
@@ -1,5 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 
+int is_xtheadvector_supported(void);
+
 int is_vector_supported(void);
 
-int launch_test(char *next_program, int test_inherit);
+int launch_test(char *next_program, int test_inherit, int xtheadvector);
diff --git a/tools/testing/selftests/riscv/vector/v_initval.c 
b/tools/testing/selftests/riscv/vector/v_initval.c
index f38b5797fa31..be9e1d18ad29 100644
--- a/tools/testing/selftests/riscv/vector/v_initval.c
+++ b/tools/testing/selftests/riscv/vector/v_initval.c
@@ -7,10 +7,16 @@
 
 TEST(v_initval)
 {
-   if (!is_vector_supported())
-   SKIP(return, "Vector not supported");
+   int xtheadvector = 0;
 
-   ASSERT_EQ(0, launch_test(NEXT_PROGRAM, 0));
+   if (!is_vector_supported()) {
+   if

[PATCH v2 16/17] selftests: riscv: Fix vector tests

2024-04-15 Thread Charlie Jenkins

Overhaul the riscv vector tests to use kselftest_harness to help the
test cases correctly report the results and decouple the individual test
cases from each other. With this refactoring, only run the test cases is
vector is reported and properly report the test case as skipped
otherwise. The v_initval_nolibc test was previously not checking if
vector was supported and used a function (malloc) which invalidates
the state of the vector registers.

Signed-off-by: Charlie Jenkins 
---
 tools/testing/selftests/riscv/vector/.gitignore|   3 +-
 tools/testing/selftests/riscv/vector/Makefile  |  17 +-
 .../selftests/riscv/vector/v_exec_initval_nolibc.c |  84 +++
 tools/testing/selftests/riscv/vector/v_helpers.c   |  56 +
 tools/testing/selftests/riscv/vector/v_helpers.h   |   5 +
 tools/testing/selftests/riscv/vector/v_initval.c   |  16 ++
 .../selftests/riscv/vector/v_initval_nolibc.c  |  68 --
 .../testing/selftests/riscv/vector/vstate_prctl.c  | 266 -
 8 files changed, 324 insertions(+), 191 deletions(-)

diff --git a/tools/testing/selftests/riscv/vector/.gitignore 
b/tools/testing/selftests/riscv/vector/.gitignore
index 9ae7964491d5..7d9c87cd0649 100644
--- a/tools/testing/selftests/riscv/vector/.gitignore
+++ b/tools/testing/selftests/riscv/vector/.gitignore
@@ -1,3 +1,4 @@
 vstate_exec_nolibc
 vstate_prctl
-v_initval_nolibc
+v_initval
+v_exec_initval_nolibc
diff --git a/tools/testing/selftests/riscv/vector/Makefile 
b/tools/testing/selftests/riscv/vector/Makefile
index bfff0ff4f3be..995746359477 100644
--- a/tools/testing/selftests/riscv/vector/Makefile
+++ b/tools/testing/selftests/riscv/vector/Makefile
@@ -2,18 +2,27 @@
 # Copyright (C) 2021 ARM Limited
 # Originally tools/testing/arm64/abi/Makefile
 
-TEST_GEN_PROGS := vstate_prctl v_initval_nolibc
-TEST_GEN_PROGS_EXTENDED := vstate_exec_nolibc
+TEST_GEN_PROGS := v_initval vstate_prctl
+TEST_GEN_PROGS_EXTENDED := vstate_exec_nolibc v_exec_initval_nolibc 
sys_hwprobe.o v_helpers.o
 
 include ../../lib.mk
 
-$(OUTPUT)/vstate_prctl: vstate_prctl.c ../hwprobe/sys_hwprobe.S
+$(OUTPUT)/sys_hwprobe.o: ../hwprobe/sys_hwprobe.S
+   $(CC) -static -c -o$@ $(CFLAGS) $^
+
+$(OUTPUT)/v_helpers.o: v_helpers.c
+   $(CC) -static -c -o$@ $(CFLAGS) $^
+
+$(OUTPUT)/vstate_prctl: vstate_prctl.c $(OUTPUT)/sys_hwprobe.o 
$(OUTPUT)/v_helpers.o
$(CC) -static -o$@ $(CFLAGS) $(LDFLAGS) $^
 
 $(OUTPUT)/vstate_exec_nolibc: vstate_exec_nolibc.c
$(CC) -nostdlib -static -include ../../../../include/nolibc/nolibc.h \
-Wall $(CFLAGS) $(LDFLAGS) $^ -o $@ -lgcc
 
-$(OUTPUT)/v_initval_nolibc: v_initval_nolibc.c
+$(OUTPUT)/v_initval: v_initval.c $(OUTPUT)/sys_hwprobe.o $(OUTPUT)/v_helpers.o
+   $(CC) -static -o$@ $(CFLAGS) $(LDFLAGS) $^
+
+$(OUTPUT)/v_exec_initval_nolibc: v_exec_initval_nolibc.c
$(CC) -nostdlib -static -include ../../../../include/nolibc/nolibc.h \
-Wall $(CFLAGS) $(LDFLAGS) $^ -o $@ -lgcc
diff --git a/tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c 
b/tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c
new file mode 100644
index ..74b13806baf0
--- /dev/null
+++ b/tools/testing/selftests/riscv/vector/v_exec_initval_nolibc.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Get values of vector registers as soon as the program starts to test if
+ * is properly cleaning the values before starting a new program. Vector
+ * registers are caller saved, so no function calls may happen before reading
+ * the values. To further ensure consistency, this file is compiled without
+ * libc and without auto-vectorization.
+ *
+ * To be "clean" all values must be either all ones or all zeroes.
+ */
+
+#define __stringify_1(x...)#x
+#define __stringify(x...)  __stringify_1(x)
+
+int main(int argc, char **argv)
+{
+   char prev_value = 0, value;
+   unsigned long vl;
+   int first = 1;
+
+   asm volatile (
+   ".option push\n\t"
+   ".option arch, +v\n\t"
+   "vsetvli%[vl], x0, e8, m1, ta, ma\n\t"
+   ".option pop\n\t"
+   : [vl] "=r" (vl)
+   );
+
+#define CHECK_VECTOR_REGISTER(register) ({ 
\
+   for (int i = 0; i < vl; i++) {  
\
+   asm volatile (  
\
+   ".option push\n\t"  
\
+   ".option arch, +v\n\t"  
\
+   "vmv.x.s %0, " __stringify(register) "\n\t" 
\
+   "vsrl.vi " __stringify(register) ", " 
__stringify(register) ", 8\n\t" \
+   ".option pop\n\t"   
\
+   : "=r" (value));
\
+   if (first) {

[PATCH v2 15/17] riscv: hwprobe: Document vendor extensions and xtheadvector extension

2024-04-15 Thread Charlie Jenkins

Document support for vendor extensions using the key
RISCV_HWPROBE_KEY_VENDOR_EXT_0 and xtheadvector extension using the key
RISCV_ISA_VENDOR_EXT_XTHEADVECTOR.

Signed-off-by: Charlie Jenkins 
---
 Documentation/arch/riscv/hwprobe.rst | 12 
 1 file changed, 12 insertions(+)

diff --git a/Documentation/arch/riscv/hwprobe.rst 
b/Documentation/arch/riscv/hwprobe.rst
index b2bcc9eed9aa..38e1b0c7c38c 100644
--- a/Documentation/arch/riscv/hwprobe.rst
+++ b/Documentation/arch/riscv/hwprobe.rst
@@ -210,3 +210,15 @@ The following keys are defined:
 
 * :c:macro:`RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE`: An unsigned int which
   represents the size of the Zicboz block in bytes.
+
+* :c:macro:`RISCV_HWPROBE_KEY_VENDOR_EXT_0`: A bitmask containing the vendor
+  extensions that are compatible with the
+  :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: base system behavior. A set of
+  CPUs is only compatible with a vendor extension if all CPUs in the set have
+  the same mvendorid and support the extension.
+
+  * T-HEAD
+
+* :c:macro:`RISCV_ISA_VENDOR_EXT_XTHEADVECTOR`: The xtheadvector vendor
+extension is supported in the T-Head ISA extensions spec starting from
+   commit a18c801634 ("Add T-Head VECTOR vendor extension. ").

-- 
2.44.0

[PATCH v2 14/17] riscv: hwprobe: Add vendor extension probing

2024-04-15 Thread Charlie Jenkins

Add a new hwprobe key "RISCV_HWPROBE_KEY_VENDOR_EXT_0" which allows
userspace to probe for the new RISCV_ISA_VENDOR_EXT_XTHEADVECTOR vendor
extension.

This new key will allow userspace code to probe for which vendor
extensions are supported. This API is modeled to be consistent with
RISCV_HWPROBE_KEY_IMA_EXT_0. The bitmask returned will have each bit
corresponding to a supported vendor extension of the cpumask set. Just
like RISCV_HWPROBE_KEY_IMA_EXT_0, this allows a userspace program to
determine all of the supported vendor extensions in one call.

The vendor extensions are namespaced per vendor. For example, if the all
of the cpus in the cpumask have an mvendorid of THEAD_VENDOR_ID, bit
0 being set means that RISCV_HWPROBE_VENDOR_EXT_XTHEADVECTOR is
  supported. If the mvendorid is instead VENDOR2, bit 0 being set will
imply a different available extension. This allows for a single hwprobe
call that can be applicable to any vendor.

Signed-off-by: Charlie Jenkins 
---
 arch/riscv/include/asm/hwprobe.h   |  4 +--
 arch/riscv/include/uapi/asm/hwprobe.h  | 11 ++-
 arch/riscv/include/uapi/asm/vendor/thead.h |  3 ++
 arch/riscv/kernel/sys_hwprobe.c| 50 ++
 4 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h
index 1378c3c9401a..3bcb291eb386 100644
--- a/arch/riscv/include/asm/hwprobe.h
+++ b/arch/riscv/include/asm/hwprobe.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
- * Copyright 2023 Rivos, Inc
+ * Copyright 2023-2024 Rivos, Inc
  */
 
 #ifndef _ASM_HWPROBE_H
@@ -9,7 +9,7 @@
 #include 
 #include 
 
-#define RISCV_HWPROBE_MAX_KEY 6
+#define RISCV_HWPROBE_MAX_KEY 7
 
 static inline bool riscv_hwprobe_key_is_valid(__s64 key)
 {
diff --git a/arch/riscv/include/uapi/asm/hwprobe.h 
b/arch/riscv/include/uapi/asm/hwprobe.h
index 9f2a8e3ff204..142b5c37730b 100644
--- a/arch/riscv/include/uapi/asm/hwprobe.h
+++ b/arch/riscv/include/uapi/asm/hwprobe.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
- * Copyright 2023 Rivos, Inc
+ * Copyright 2023-2024 Rivos, Inc
  */
 
 #ifndef _UAPI_ASM_HWPROBE_H
@@ -67,6 +67,15 @@ struct riscv_hwprobe {
 #defineRISCV_HWPROBE_MISALIGNED_UNSUPPORTED(4 << 0)
 #defineRISCV_HWPROBE_MISALIGNED_MASK   (7 << 0)
 #define RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE6
+/*
+ * It is not possible for one CPU to have multiple vendor ids, so each vendor
+ * has its own vendor extension "namespace". The keys for each vendor starts
+ * at zero.
+ *
+ * All vendor extension keys live in a vendor-specific header under
+ * arch/riscv/include/uapi/asm/vendor
+ */
+#define RISCV_HWPROBE_KEY_VENDOR_EXT_0 7
 /* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
 
 /* Flags */
diff --git a/arch/riscv/include/uapi/asm/vendor/thead.h 
b/arch/riscv/include/uapi/asm/vendor/thead.h
new file mode 100644
index ..43790ebe5faf
--- /dev/null
+++ b/arch/riscv/include/uapi/asm/vendor/thead.h
@@ -0,0 +1,3 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#defineRISCV_HWPROBE_VENDOR_EXT_XTHEADVECTOR   (1 << 0)
diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c
index 394f1343490c..15ce916a7321 100644
--- a/arch/riscv/kernel/sys_hwprobe.c
+++ b/arch/riscv/kernel/sys_hwprobe.c
@@ -139,6 +139,52 @@ static void hwprobe_isa_ext0(struct riscv_hwprobe *pair,
pair->value &= ~missing;
 }
 
+static void hwprobe_isa_vendor_ext0(struct riscv_hwprobe *pair,
+   const struct cpumask *cpus)
+{
+   int cpu;
+   u64 missing = 0;
+
+   pair->value = 0;
+
+   struct riscv_hwprobe mvendorid = {
+   .key = RISCV_HWPROBE_KEY_MVENDORID,
+   .value = 0
+   };
+
+   hwprobe_arch_id(, cpus);
+
+   /* Set value to zero if CPUs in the set do not have the same vendor. */
+   if (mvendorid.value == -1ULL)
+   return;
+
+   /*
+* Loop through and record vendor extensions that 1) anyone has, and
+* 2) anyone doesn't have.
+*/
+   for_each_cpu(cpu, cpus) {
+   struct riscv_isainfo *isavendorinfo = _isa_vendor[cpu];
+
+#define VENDOR_EXT_KEY(vendor, ext)
\
+   do {
\
+   if (mvendorid.value == (vendor) &&  
\
+   __riscv_isa_vendor_extension_available(isavendorinfo->isa,  
\
+  
RISCV_ISA_VENDOR_EXT_##ext)) \
+   pair->value |= RISCV_HWPROBE_VENDOR_EXT_##ext;  
\
+   else
\
+   missing |=

[PATCH v2 13/17] riscv: vector: Support xtheadvector save/restore

2024-04-15 Thread Charlie Jenkins

Use alternatives to add support for xtheadvector vector save/restore
routines.

Signed-off-by: Charlie Jenkins 
---
 arch/riscv/Kconfig |   2 +
 arch/riscv/Kconfig.vendor  |  11 ++
 arch/riscv/include/asm/csr.h   |   6 +
 arch/riscv/include/asm/switch_to.h |   2 +-
 arch/riscv/include/asm/vector.h| 246 ++---
 arch/riscv/kernel/cpufeature.c |   2 +-
 arch/riscv/kernel/kernel_mode_vector.c |   8 +-
 arch/riscv/kernel/process.c|   4 +-
 arch/riscv/kernel/signal.c |   6 +-
 arch/riscv/kernel/vector.c |  35 +++--
 10 files changed, 250 insertions(+), 72 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index be09c8836d56..fec86fba3acd 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -759,6 +759,8 @@ config RISCV_EFFICIENT_UNALIGNED_ACCESS
 
 endchoice
 
+source "arch/riscv/Kconfig.vendor"
+
 endmenu # "Platform type"
 
 menu "Kernel features"
diff --git a/arch/riscv/Kconfig.vendor b/arch/riscv/Kconfig.vendor
new file mode 100644
index ..be7bd3b4d936
--- /dev/null
+++ b/arch/riscv/Kconfig.vendor
@@ -0,0 +1,11 @@
+config RISCV_ISA_XTHEADVECTOR
+   bool "xtheadvector extension support"
+   depends on RISCV_ISA_V
+   depends on FPU
+   default y
+   help
+ Say N here if you want to disable all xtheadvector related procedure
+ in the kernel. This will disable vector for any T-Head board that
+ contains xtheadvector rather than the standard vector.
+
+ If you don't know what to do here, say Y.
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index e5a35efd56e0..13657d096e7d 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -30,6 +30,12 @@
 #define SR_VS_CLEAN_AC(0x0400, UL)
 #define SR_VS_DIRTY_AC(0x0600, UL)
 
+#define SR_VS_THEAD_AC(0x0180, UL) /* xtheadvector Status */
+#define SR_VS_OFF_THEAD_AC(0x, UL)
+#define SR_VS_INITIAL_THEAD_AC(0x0080, UL)
+#define SR_VS_CLEAN_THEAD  _AC(0x0100, UL)
+#define SR_VS_DIRTY_THEAD  _AC(0x0180, UL)
+
 #define SR_XS  _AC(0x00018000, UL) /* Extension Status */
 #define SR_XS_OFF  _AC(0x, UL)
 #define SR_XS_INITIAL  _AC(0x8000, UL)
diff --git a/arch/riscv/include/asm/switch_to.h 
b/arch/riscv/include/asm/switch_to.h
index 7efdb0584d47..ada6b5cf2d94 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -78,7 +78,7 @@ do {  \
struct task_struct *__next = (next);\
if (has_fpu())  \
__switch_to_fpu(__prev, __next);\
-   if (has_vector())   \
+   if (has_vector() || has_xtheadvector()) \
__switch_to_vector(__prev, __next); \
((last) = __switch_to(__prev, __next)); \
 } while (0)
diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 731dcd0ed4de..9871f59c7cfc 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -18,6 +18,26 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+
+#define __riscv_v_vstate_or(_val, TYPE) ({ \
+   typeof(_val) _res = _val;   \
+   if (has_xtheadvector()) \
+   _res = (_res & ~SR_VS_THEAD) | SR_VS_##TYPE##_THEAD;\
+   else\
+   _res = (_res & ~SR_VS) | SR_VS_##TYPE;  \
+   _res;   \
+})
+
+#define __riscv_v_vstate_check(_val, TYPE) ({  \
+   bool _res;  \
+   if (has_xtheadvector()) \
+   _res = ((_val) & SR_VS_THEAD) == SR_VS_##TYPE##_THEAD;  \
+   else\
+   _res = ((_val) & SR_VS) == SR_VS_##TYPE;\
+   _res;   \
+})
 
 extern unsigned long riscv_v_vsize;
 int riscv_v_setup_vsize(void);
@@ -40,39 +60,62 @@ static __always_inline bool has_vector(void)
return riscv_has_extension_unlikely(RISCV_ISA_EXT_v);
 }
 
+static __always_inline bool has_xtheadvector_no_alternatives(void)
+{
+   if (IS_ENABLED(CONFIG_RISCV_ISA_XTHEADVECTOR) && hart_isa_vendorid == 
THEAD_VENDOR_ID)
+   return riscv_isa_vendor_extension_available(NULL, XTHEADVECTOR);
+   else
+   return false;
+}
+
+static __always_inline bool has_xtheadvector(void)
+{
+   if (IS_ENABLED(CONFIG_RISCV_ISA_XTHEADVECTOR))
+   return riscv_has_vendor_extension_unlikely(THEAD_VENDOR_ID,
+

[PATCH v2 11/17] riscv: csr: Add CSR encodings for VCSR_VXRM/VCSR_VXSAT

2024-04-15 Thread Charlie Jenkins

The VXRM vector csr for xtheadvector has an encoding of 0xa and VXSAT
has an encoding of 0x9.

Co-developed-by: Heiko Stuebner 
Signed-off-by: Charlie Jenkins 
---
 arch/riscv/include/asm/csr.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 13bc99c995d1..e5a35efd56e0 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -219,6 +219,8 @@
 #define VCSR_VXRM_MASK 3
 #define VCSR_VXRM_SHIFT1
 #define VCSR_VXSAT_MASK1
+#define VCSR_VXSAT 0x9
+#define VCSR_VXRM  0xa
 
 /* symbolic CSR names: */
 #define CSR_CYCLE  0xc00

-- 
2.44.0

[PATCH v2 10/17] RISC-V: define the elements of the VCSR vector CSR

2024-04-15 Thread Charlie Jenkins

From: Heiko Stuebner 

The VCSR CSR contains two elements VXRM[2:1] and VXSAT[0].

Define constants for those to access the elements in a readable way.

Acked-by: Guo Ren 
Reviewed-by: Conor Dooley 
Signed-off-by: Heiko Stuebner 
Signed-off-by: Charlie Jenkins 
---
 arch/riscv/include/asm/csr.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 2468c55933cd..13bc99c995d1 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -215,6 +215,11 @@
 #define SMSTATEEN0_SSTATEEN0_SHIFT 63
 #define SMSTATEEN0_SSTATEEN0   (_ULL(1) << SMSTATEEN0_SSTATEEN0_SHIFT)
 
+/* VCSR flags */
+#define VCSR_VXRM_MASK 3
+#define VCSR_VXRM_SHIFT1
+#define VCSR_VXSAT_MASK1
+
 /* symbolic CSR names: */
 #define CSR_CYCLE  0xc00
 #define CSR_TIME   0xc01

-- 
2.44.0

[PATCH v2 09/17] riscv: uaccess: Add alternative for xtheadvector uaccess

2024-04-15 Thread Charlie Jenkins

At this time, use the fallback uaccess routines rather than customizing
the vectorized uaccess routines to be compatible with xtheadvector.

Signed-off-by: Charlie Jenkins 
---
 arch/riscv/lib/uaccess.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
index bc22c078aba8..1fe798666aee 100644
--- a/arch/riscv/lib/uaccess.S
+++ b/arch/riscv/lib/uaccess.S
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
.macro fixup op reg addr lbl
 100:
@@ -15,6 +16,7 @@
 SYM_FUNC_START(__asm_copy_to_user)
 #ifdef CONFIG_RISCV_ISA_V
ALTERNATIVE("j fallback_scalar_usercopy", "nop", 0, RISCV_ISA_EXT_v, 
CONFIG_RISCV_ISA_V)
+   ALTERNATIVE("nop", "j fallback_scalar_usercopy", THEAD_VENDOR_ID, 
RISCV_ISA_VENDOR_EXT_XTHEADVECTOR, CONFIG_RISCV_ISA_XTHEADVECTOR)
REG_L   t0, riscv_v_usercopy_threshold
bltua2, t0, fallback_scalar_usercopy
tail enter_vector_usercopy

-- 
2.44.0

[PATCH v2 08/17] riscv: drivers: Convert xandespmu to use the vendor extension framework

2024-04-15 Thread Charlie Jenkins

Migrate xandespmu out of riscv_isa_ext and into a new Andes-specific
vendor namespace.

Signed-off-by: Charlie Jenkins 
---
 arch/riscv/include/asm/hwcap.h |  4 +++-
 arch/riscv/include/asm/vendor_extensions.h |  3 +++
 arch/riscv/kernel/cpufeature.c |  1 -
 arch/riscv/kernel/vendor_extensions.c  |  4 
 arch/riscv/kernel/vendor_extensions/Makefile   |  1 +
 arch/riscv/kernel/vendor_extensions/andes_extensions.c | 13 +
 drivers/perf/riscv_pmu_sbi.c   |  7 ---
 7 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 38157be5becd..4b986e4b56f2 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -80,7 +80,6 @@
 #define RISCV_ISA_EXT_ZFA  71
 #define RISCV_ISA_EXT_ZTSO 72
 #define RISCV_ISA_EXT_ZACAS73
-#define RISCV_ISA_EXT_XANDESPMU74
 
 #define RISCV_ISA_EXT_XLINUXENVCFG 127
 
@@ -103,6 +102,9 @@
  */
 #define RISCV_ISA_VENDOR_EXT_BASE  0x8000
 
+/* Andes Vendor Extensions */
+#define RISCV_ISA_VENDOR_EXT_XANDESPMU 0x8000
+
 /* THead Vendor Extensions */
 #define RISCV_ISA_VENDOR_EXT_XTHEADVECTOR  0x8000
 
diff --git a/arch/riscv/include/asm/vendor_extensions.h 
b/arch/riscv/include/asm/vendor_extensions.h
index 0a1955e1c900..33a430cc50cb 100644
--- a/arch/riscv/include/asm/vendor_extensions.h
+++ b/arch/riscv/include/asm/vendor_extensions.h
@@ -9,6 +9,9 @@
 extern const struct riscv_isa_ext_data riscv_isa_vendor_ext_thead[];
 extern const size_t riscv_isa_vendor_ext_count_thead;
 
+extern const struct riscv_isa_ext_data riscv_isa_vendor_ext_andes[];
+extern const size_t riscv_isa_vendor_ext_count_andes;
+
 bool get_isa_vendor_ext(unsigned long vendorid, const struct 
riscv_isa_ext_data **isa_vendor_ext,
size_t *count);
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 799ec2d2e9e0..949c06970c4f 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -321,7 +321,6 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
__RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
-   __RISCV_ISA_EXT_DATA(xandespmu, RISCV_ISA_EXT_XANDESPMU),
 };
 
 const size_t riscv_isa_ext_count = ARRAY_SIZE(riscv_isa_ext);
diff --git a/arch/riscv/kernel/vendor_extensions.c 
b/arch/riscv/kernel/vendor_extensions.c
index 3a8a6c6dd34e..c5ca02ce1bb1 100644
--- a/arch/riscv/kernel/vendor_extensions.c
+++ b/arch/riscv/kernel/vendor_extensions.c
@@ -21,6 +21,10 @@ bool __init get_isa_vendor_ext(unsigned long vendorid,
*isa_vendor_ext = riscv_isa_vendor_ext_thead;
*count = riscv_isa_vendor_ext_count_thead;
break;
+   case ANDES_VENDOR_ID:
+   *isa_vendor_ext = riscv_isa_vendor_ext_andes;
+   *count = riscv_isa_vendor_ext_count_andes;
+   break;
default:
*isa_vendor_ext = NULL;
*count = 0;
diff --git a/arch/riscv/kernel/vendor_extensions/Makefile 
b/arch/riscv/kernel/vendor_extensions/Makefile
index dcf3de8d4658..8014594aafa1 100644
--- a/arch/riscv/kernel/vendor_extensions/Makefile
+++ b/arch/riscv/kernel/vendor_extensions/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
+obj-y  += andes_extensions.o
 obj-y  += thead_extensions.o
diff --git a/arch/riscv/kernel/vendor_extensions/andes_extensions.c 
b/arch/riscv/kernel/vendor_extensions/andes_extensions.c
new file mode 100644
index ..b7450f99bfb5
--- /dev/null
+++ b/arch/riscv/kernel/vendor_extensions/andes_extensions.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include 
+#include 
+#include 
+
+#include 
+
+const struct riscv_isa_ext_data riscv_isa_vendor_ext_andes[] = {
+   __RISCV_ISA_EXT_DATA(xandespmu, RISCV_ISA_VENDOR_EXT_XANDESPMU),
+};
+
+const size_t riscv_isa_vendor_ext_count_andes = 
ARRAY_SIZE(riscv_isa_vendor_ext_andes);
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 8cbe6e5f9c39..13e37296cb5f 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define ALT_SBI_PMU_OVERFLOW(__ovl)\
 asm volatile(ALTERNATIVE_2(\
@@ -32,7 +33,7 @@ asm volatile(ALTERNATIVE_2(   
\
THEAD_VENDOR_ID, ERRATA_THEAD_PMU,  \
CONFIG_ERRATA_THEAD_PMU,\
"csrr %0, " __stringify(ANDES_CSR_SCOUNTEROF),  \
-   0, RISCV_ISA_EXT_XANDESPMU,

[PATCH v2 07/17] riscv: Introduce vendor variants of extension helpers

2024-04-15 Thread Charlie Jenkins

Vendor extensions are maintained in riscv_isa_vendor (separate from
standard extensions which live in riscv_isa). Create vendor variants for
the existing extension helpers to interface with the riscv_isa_vendor
bitmap. There is a good amount of overlap between these functions, so
the alternative checking code can be factored out.

Signed-off-by: Charlie Jenkins 
---
 arch/riscv/errata/sifive/errata.c   |   2 +
 arch/riscv/errata/thead/errata.c|   2 +
 arch/riscv/include/asm/cpufeature.h | 142 +++-
 arch/riscv/include/asm/hwprobe.h|   3 +
 arch/riscv/kernel/cpufeature.c  |  53 --
 arch/riscv/kernel/sys_hwprobe.c |   4 +-
 6 files changed, 161 insertions(+), 45 deletions(-)

diff --git a/arch/riscv/errata/sifive/errata.c 
b/arch/riscv/errata/sifive/errata.c
index 3d9a32d791f7..847ff85cc911 100644
--- a/arch/riscv/errata/sifive/errata.c
+++ b/arch/riscv/errata/sifive/errata.c
@@ -99,6 +99,8 @@ void sifive_errata_patch_func(struct alt_entry *begin, struct 
alt_entry *end,
for (alt = begin; alt < end; alt++) {
if (alt->vendor_id != SIFIVE_VENDOR_ID)
continue;
+   if (alt->patch_id >= RISCV_ISA_VENDOR_EXT_BASE)
+   continue;
if (alt->patch_id >= ERRATA_SIFIVE_NUMBER) {
WARN(1, "This errata id:%d is not in kernel errata 
list", alt->patch_id);
continue;
diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
index b1c410bbc1ae..6e3eabfe92af 100644
--- a/arch/riscv/errata/thead/errata.c
+++ b/arch/riscv/errata/thead/errata.c
@@ -163,6 +163,8 @@ void thead_errata_patch_func(struct alt_entry *begin, 
struct alt_entry *end,
for (alt = begin; alt < end; alt++) {
if (alt->vendor_id != THEAD_VENDOR_ID)
continue;
+   if (alt->patch_id >= RISCV_ISA_VENDOR_EXT_BASE)
+   continue;
if (alt->patch_id >= ERRATA_THEAD_NUMBER)
continue;
 
diff --git a/arch/riscv/include/asm/cpufeature.h 
b/arch/riscv/include/asm/cpufeature.h
index 50fa174cccb9..12dd36bafa2a 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -110,23 +110,19 @@ bool __riscv_isa_extension_available(const unsigned long 
*isa_bitmap, unsigned i
 #define riscv_isa_extension_available(isa_bitmap, ext) \
__riscv_isa_extension_available(isa_bitmap, RISCV_ISA_EXT_##ext)
 
+bool __riscv_isa_vendor_extension_available(const unsigned long 
*vendor_isa_bitmap,
+   unsigned int bit);
+#define riscv_isa_vendor_extension_available(isa_bitmap, ext)  \
+   __riscv_isa_vendor_extension_available(isa_bitmap, 
RISCV_ISA_VENDOR_EXT_##ext)
+
 static __always_inline bool
-riscv_has_extension_likely(const unsigned long ext)
+__riscv_has_extension_likely_alternatives(const unsigned long vendor, const 
unsigned long ext)
 {
-   compiletime_assert(ext < RISCV_ISA_EXT_MAX,
-  "ext must be < RISCV_ISA_EXT_MAX");
-
-   if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
-   asm goto(
-   ALTERNATIVE("j  %l[l_no]", "nop", 0, %[ext], 1)
-   :
-   : [ext] "i" (ext)
-   :
-   : l_no);
-   } else {
-   if (!__riscv_isa_extension_available(NULL, ext))
-   goto l_no;
-   }
+   asm goto(ALTERNATIVE("j %l[l_no]", "nop", %[vendor], %[ext], 1)
+   :
+   : [vendor] "i" (vendor), [ext] "i" (ext)
+   :
+   : l_no);
 
return true;
 l_no:
@@ -134,42 +130,118 @@ riscv_has_extension_likely(const unsigned long ext)
 }
 
 static __always_inline bool
-riscv_has_extension_unlikely(const unsigned long ext)
+__riscv_has_extension_unlikely_alternatives(const unsigned long vendor, const 
unsigned long ext)
 {
-   compiletime_assert(ext < RISCV_ISA_EXT_MAX,
-  "ext must be < RISCV_ISA_EXT_MAX");
-
-   if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
-   asm goto(
-   ALTERNATIVE("nop", "j   %l[l_yes]", 0, %[ext], 1)
-   :
-   : [ext] "i" (ext)
-   :
-   : l_yes);
-   } else {
-   if (__riscv_isa_extension_available(NULL, ext))
-   goto l_yes;
-   }
+   asm goto(ALTERNATIVE("nop", "j  %l[l_yes]", %[vendor], %[ext], 1)
+   :
+   : [vendor] "i" (vendor), [ext] "i" (ext)
+   :
+   : l_yes);
 
return false;
 l_yes:
return true;
 }
 
+/* Standard extension helpers */
+
+static __always_inline bool
+riscv_has_extension_likely(const unsigned long ext)
+{
+   compiletime_assert(ext < RISCV_ISA_EXT_MAX,
+  "ext must be < RISCV_ISA_EXT_MAX");
+
+   if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE))
+   return

[PATCH v2 06/17] riscv: Extend cpufeature.c to detect vendor extensions

2024-04-15 Thread Charlie Jenkins

Create a private namespace for each vendor above 0x8000. During the
probing of hardware capabilities, the vendorid of each hart is used to
resolve the vendor extension compatibility.

Signed-off-by: Charlie Jenkins 
---
 arch/riscv/include/asm/cpufeature.h|  28 +
 arch/riscv/include/asm/hwcap.h |  23 
 arch/riscv/include/asm/vendor_extensions.h |  15 +++
 arch/riscv/kernel/Makefile |   2 +
 arch/riscv/kernel/cpufeature.c | 136 +++--
 arch/riscv/kernel/vendor_extensions.c  |  32 +
 arch/riscv/kernel/vendor_extensions/Makefile   |   3 +
 .../kernel/vendor_extensions/thead_extensions.c|  13 ++
 8 files changed, 214 insertions(+), 38 deletions(-)

diff --git a/arch/riscv/include/asm/cpufeature.h 
b/arch/riscv/include/asm/cpufeature.h
index 347805446151..50fa174cccb9 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -26,13 +26,41 @@ struct riscv_isainfo {
DECLARE_BITMAP(isa, RISCV_ISA_EXT_MAX);
 };
 
+struct riscv_isavendorinfo {
+   DECLARE_BITMAP(isa, RISCV_ISA_VENDOR_EXT_SIZE);
+};
+
 DECLARE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);
 
 /* Per-cpu ISA extensions. */
 extern struct riscv_isainfo hart_isa[NR_CPUS];
 
+/* Per-cpu ISA vendor extensions. */
+extern struct riscv_isainfo hart_isa_vendor[NR_CPUS];
+
+/* Vendor that is associated with hart_isa_vendor */
+extern unsigned long hart_isa_vendorid;
+
 void riscv_user_isa_enable(void);
 
+#define _RISCV_ISA_EXT_DATA(_name, _id, _subset_exts, _subset_exts_size) { 
\
+   .name = #_name, 
\
+   .property = #_name, 
\
+   .id = _id,  
\
+   .subset_ext_ids = _subset_exts, 
\
+   .subset_ext_size = _subset_exts_size
\
+}
+
+#define __RISCV_ISA_EXT_DATA(_name, _id) _RISCV_ISA_EXT_DATA(_name, _id, NULL, 
0)
+
+/* Used to declare pure "lasso" extension (Zk for instance) */
+#define __RISCV_ISA_EXT_BUNDLE(_name, _bundled_exts) \
+   _RISCV_ISA_EXT_DATA(_name, RISCV_ISA_EXT_INVALID, _bundled_exts, 
ARRAY_SIZE(_bundled_exts))
+
+/* Used to declare extensions that are a superset of other extensions (Zvbb 
for instance) */
+#define __RISCV_ISA_EXT_SUPERSET(_name, _id, _sub_exts) \
+   _RISCV_ISA_EXT_DATA(_name, _id, _sub_exts, ARRAY_SIZE(_sub_exts))
+
 #if defined(CONFIG_RISCV_MISALIGNED)
 bool check_unaligned_access_emulated_all_cpus(void);
 void unaligned_emulation_finish(void);
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index e17d0078a651..38157be5becd 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -87,6 +87,29 @@
 #define RISCV_ISA_EXT_MAX  128
 #define RISCV_ISA_EXT_INVALID  U32_MAX
 
+/*
+ * These macros represent the logical IDs of each vendor RISC-V ISA extension
+ * and are used in each vendor ISA bitmap. The logical IDs start from
+ * RISCV_ISA_VENDOR_EXT_BASE, which allows the 0-0x7999 range to be
+ * reserved for non-vendor extensions. The maximum, RISCV_ISA_VENDOR_EXT_MAX,
+ * is defined in order to allocate the bitmap and may be increased when
+ * necessary.
+ *
+ * Values are expected to overlap between vendors.
+ *
+ * New extensions should just be added to the bottom of the respective vendor,
+ * rather than added alphabetically, in order to avoid unnecessary shuffling.
+ *
+ */
+#define RISCV_ISA_VENDOR_EXT_BASE  0x8000
+
+/* THead Vendor Extensions */
+#define RISCV_ISA_VENDOR_EXT_XTHEADVECTOR  0x8000
+
+#define RISCV_ISA_VENDOR_EXT_MAX   0x8080
+#define RISCV_ISA_VENDOR_EXT_SIZE  (RISCV_ISA_VENDOR_EXT_MAX - 
RISCV_ISA_VENDOR_EXT_BASE)
+#define RISCV_ISA_VENDOR_EXT_INVALID   U32_MAX
+
 #ifdef CONFIG_RISCV_M_MODE
 #define RISCV_ISA_EXT_SxAIARISCV_ISA_EXT_SMAIA
 #else
diff --git a/arch/riscv/include/asm/vendor_extensions.h 
b/arch/riscv/include/asm/vendor_extensions.h
new file mode 100644
index ..0a1955e1c900
--- /dev/null
+++ b/arch/riscv/include/asm/vendor_extensions.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright 2024 Rivos, Inc
+ */
+
+#ifndef _ASM_VENDOR_EXTENSIONS_H
+#define _ASM_VENDOR_EXTENSIONS_H
+
+extern const struct riscv_isa_ext_data riscv_isa_vendor_ext_thead[];
+extern const size_t riscv_isa_vendor_ext_count_thead;
+
+bool get_isa_vendor_ext(unsigned long vendorid, const struct 
riscv_isa_ext_data **isa_vendor_ext,
+   size_t *count);
+
+#endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 81d94a8ee10f..53361c50fb46 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -58,6 +58,8 @@ obj-y += riscv_ksyms.o

[PATCH v2 05/17] riscv: Fix extension subset checking

2024-04-15 Thread Charlie Jenkins

This loop is supposed to check if ext->subset_ext_ids[j] is valid, rather
than if ext->subset_ext_ids[i] is valid, before setting the extension
id ext->subset_ext_ids[j] in isainfo->isa.

Signed-off-by: Charlie Jenkins 
Reviewed-by: Conor Dooley 
Fixes: 0d8295ed975b ("riscv: add ISA extension parsing for scalar crypto")
---
 arch/riscv/kernel/cpufeature.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index c6e27b45e192..6dff7bb1db3f 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -607,7 +607,7 @@ static int __init riscv_fill_hwcap_from_ext_list(unsigned 
long *isa2hwcap)
 
if (ext->subset_ext_size) {
for (int j = 0; j < ext->subset_ext_size; j++) {
-   if 
(riscv_isa_extension_check(ext->subset_ext_ids[i]))
+   if 
(riscv_isa_extension_check(ext->subset_ext_ids[j]))
set_bit(ext->subset_ext_ids[j], 
isainfo->isa);
}
}

-- 
2.44.0

[PATCH v2 04/17] riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree

2024-04-15 Thread Charlie Jenkins

The D1/D1s SoCs support xtheadvector which should be included in the
devicetree. Also include vendorid for the cpu.

Signed-off-by: Charlie Jenkins 
---
 arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi 
b/arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi
index 64c3c2e6cbe0..4788bb50afa2 100644
--- a/arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi
+++ b/arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi
@@ -27,7 +27,8 @@ cpu0: cpu@0 {
riscv,isa = "rv64imafdc";
riscv,isa-base = "rv64i";
riscv,isa-extensions = "i", "m", "a", "f", "d", "c", 
"zicntr", "zicsr",
-  "zifencei", "zihpm";
+  "zifencei", "zihpm", 
"xtheadvector";
+   riscv,vendorid = <0x 0x005b7>;
#cooling-cells = <2>;
 
cpu0_intc: interrupt-controller {

-- 
2.44.0

[PATCH v2 03/17] dt-bindings: riscv: Add vendorid

2024-04-15 Thread Charlie Jenkins

vendorid are required during DT parsing to determine known hardware
capabilities. This parsing happens before the whole system has booted,
so only the boot hart is online and able to report the value of its
vendorid.

Signed-off-by: Charlie Jenkins 
---
 Documentation/devicetree/bindings/riscv/cpus.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml 
b/Documentation/devicetree/bindings/riscv/cpus.yaml
index d87dd50f1a4b..030c7697d3b7 100644
--- a/Documentation/devicetree/bindings/riscv/cpus.yaml
+++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
@@ -94,6 +94,11 @@ properties:
 description:
   The blocksize in bytes for the Zicboz cache operations.
 
+  riscv,vendorid:
+$ref: /schemas/types.yaml#/definitions/uint64
+description:
+  Same value as the mvendorid CSR.
+
   # RISC-V has multiple properties for cache op block sizes as the sizes
   # differ between individual CBO extensions
   cache-op-block-size: false

-- 
2.44.0

[PATCH v2 02/17] dt-bindings: riscv: Add xtheadvector ISA extension description

2024-04-15 Thread Charlie Jenkins

The xtheadvector ISA extension is described on the T-Head extension spec
Github page [1] at commit 95358cb2cca9.

Link: 
https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361c61d3
35e03d3134b14133f/xtheadvector.adoc [1]

Signed-off-by: Charlie Jenkins 
---
 Documentation/devicetree/bindings/riscv/extensions.yaml | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml 
b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 468c646247aa..99d2a9e8c52d 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -477,6 +477,10 @@ properties:
 latency, as ratified in commit 56ed795 ("Update
 riscv-crypto-spec-vector.adoc") of riscv-crypto.
 
+# vendor extensions, each extension sorted alphanumerically under the
+# vendor they belong to. Vendors are sorted alphanumerically as well.
+
+# Andes
 - const: xandespmu
   description:
 The Andes Technology performance monitor extension for counter 
overflow
@@ -484,5 +488,11 @@ properties:
 Registers in the AX45MP datasheet.
 
https://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
 
+# T-HEAD
+- const: xtheadvector
+  description:
+The T-HEAD specific 0.7.1 vector implementation as written in
+
https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361c61d335e03d3134b14133f/xtheadvector.adoc.
+
 additionalProperties: true
 ...

-- 
2.44.0

[PATCH v2 01/17] riscv: cpufeature: Fix thead vector hwcap removal

2024-04-15 Thread Charlie Jenkins

The riscv_cpuinfo struct that contains mvendorid and marchid is not
populated until all harts are booted which happens after the DT parsing.
Use the vendorid/archid values from the DT if available or assume all
harts have the same values as the boot hart as a fallback.

Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older 
T-Head CPUs")
Signed-off-by: Charlie Jenkins 
---
 arch/riscv/include/asm/sbi.h   |  2 ++
 arch/riscv/kernel/cpu.c| 36 
 arch/riscv/kernel/cpufeature.c | 12 ++--
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 6e68f8dff76b..0fab508a65b3 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -370,6 +370,8 @@ static inline int sbi_remote_fence_i(const struct cpumask 
*cpu_mask) { return -1
 static inline void sbi_init(void) {}
 #endif /* CONFIG_RISCV_SBI */
 
+unsigned long riscv_get_mvendorid(void);
+unsigned long riscv_get_marchid(void);
 unsigned long riscv_cached_mvendorid(unsigned int cpu_id);
 unsigned long riscv_cached_marchid(unsigned int cpu_id);
 unsigned long riscv_cached_mimpid(unsigned int cpu_id);
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index d11d6320fb0d..8c8250b98752 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -139,6 +139,34 @@ int riscv_of_parent_hartid(struct device_node *node, 
unsigned long *hartid)
return -1;
 }
 
+unsigned long __init riscv_get_marchid(void)
+{
+   struct riscv_cpuinfo *ci = this_cpu_ptr(_cpuinfo);
+
+#if IS_ENABLED(CONFIG_RISCV_SBI)
+   ci->marchid = sbi_spec_is_0_1() ? 0 : sbi_get_marchid();
+#elif IS_ENABLED(CONFIG_RISCV_M_MODE)
+   ci->marchid = csr_read(CSR_MARCHID);
+#else
+   ci->marchid = 0;
+#endif
+   return ci->marchid;
+}
+
+unsigned long __init riscv_get_mvendorid(void)
+{
+   struct riscv_cpuinfo *ci = this_cpu_ptr(_cpuinfo);
+
+#if IS_ENABLED(CONFIG_RISCV_SBI)
+   ci->mvendorid = sbi_spec_is_0_1() ? 0 : sbi_get_mvendorid();
+#elif IS_ENABLED(CONFIG_RISCV_M_MODE)
+   ci->mvendorid = csr_read(CSR_MVENDORID);
+#else
+   ci->mvendorid = 0;
+#endif
+   return ci->mvendorid;
+}
+
 DEFINE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);
 
 unsigned long riscv_cached_mvendorid(unsigned int cpu_id)
@@ -170,12 +198,12 @@ static int riscv_cpuinfo_starting(unsigned int cpu)
struct riscv_cpuinfo *ci = this_cpu_ptr(_cpuinfo);
 
 #if IS_ENABLED(CONFIG_RISCV_SBI)
-   ci->mvendorid = sbi_spec_is_0_1() ? 0 : sbi_get_mvendorid();
-   ci->marchid = sbi_spec_is_0_1() ? 0 : sbi_get_marchid();
+   ci->mvendorid = ci->mvendorid ? ci->mvendorid : sbi_spec_is_0_1() ? 0 : 
sbi_get_mvendorid();
+   ci->marchid = ci->marchid ? ci->marchid : sbi_spec_is_0_1() ? 0 : 
sbi_get_marchid();
ci->mimpid = sbi_spec_is_0_1() ? 0 : sbi_get_mimpid();
 #elif IS_ENABLED(CONFIG_RISCV_M_MODE)
-   ci->mvendorid = csr_read(CSR_MVENDORID);
-   ci->marchid = csr_read(CSR_MARCHID);
+   ci->mvendorid = ci->mvendorid ? ci->mvendorid : csr_read(CSR_MVENDORID);
+   ci->marchid = ci->marchid ? ci->marchid : csr_read(CSR_MARCHID);
ci->mimpid = csr_read(CSR_MIMPID);
 #else
ci->mvendorid = 0;
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 3ed2359eae35..c6e27b45e192 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -490,6 +490,8 @@ static void __init 
riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
struct acpi_table_header *rhct;
acpi_status status;
unsigned int cpu;
+   u64 boot_vendorid;
+   u64 boot_archid;
 
if (!acpi_disabled) {
status = acpi_get_table(ACPI_SIG_RHCT, 0, );
@@ -497,6 +499,13 @@ static void __init 
riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
return;
}
 
+   /*
+* Naively assume that all harts have the same mvendorid/marchid as the
+* boot hart.
+*/
+   boot_vendorid = riscv_get_mvendorid();
+   boot_archid = riscv_get_marchid();
+
for_each_possible_cpu(cpu) {
struct riscv_isainfo *isainfo = _isa[cpu];
unsigned long this_hwcap = 0;
@@ -544,8 +553,7 @@ static void __init 
riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
 * CPU cores with the ratified spec will contain non-zero
 * marchid.
 */
-   if (acpi_disabled && riscv_cached_mvendorid(cpu) == 
THEAD_VENDOR_ID &&
-   riscv_cached_marchid(cpu) == 0x0) {
+   if (acpi_disabled && boot_vendorid == THEAD_VENDOR_ID && 
boot_archid == 0x0) {
this_hwcap &= ~isa2hwcap[RISCV_ISA_EXT_v];
clear_bit(RISCV_ISA_EXT_v, isainfo->isa);
}

-- 
2.44.0

[PATCH v2 00/17] riscv: Support vendor extensions and xtheadvector

2024-04-15 Thread Charlie Jenkins

This patch series ended up much larger than expected, please bear with
me! The goal here is to support vendor extensions, starting at probing
the device tree and ending with reporting to userspace.

The main design objective was to allow vendors to operate independently
of each other. This has been achieved by delegating vendor extensions to
a new struct "hart_isa_vendor" which is a counterpart to "hart_isa".

Each vendor will have their own list of extensions they support. Each
vendor will have a "namespace" to themselves which is set at the key
values of 0x8000 - 0x8080. It is up to the vendor's disgression how they
wish to allocate keys in the range for their vendor extensions.

Reporting to userspace follows a similar story, leveraging the hwprobe
syscall. There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_0 that
is used to request supported vendor extensions. The vendor extension
keys are disambiguated by the vendor associated with the cpumask passed
into hwprobe. The entire 64-bit key space is available to each vendor.

On to the xtheadvector specific code. xtheadvector is a custom extension
that is based upon riscv vector version 0.7.1 [1]. All of the vector
routines have been modified to support this alternative vector version
based upon whether xtheadvector was determined to be supported at boot.
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board on 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [2] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [3] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.

To test the integration, I used the riscv vector kselftests. I modified
the test cases to be able to more easily extend them, and then added a
xtheadvector target that works by calling hwprobe and swapping out the
vector asm if needed.

[1] 
https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361c61d335e03d3134b14133f/xtheadvector.adoc
[2] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[3] 
https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9a0b48

Signed-off-by: Charlie Jenkins 
---
Changes in v2:
- Added commit hash to xtheadvector
- Simplified riscv,isa vector removal fix to not mess with the DT
  riscv,vendorid
- Moved riscv,vendorid parsing into a different patch and cache the
  value to be used by alternative patching
- Reduce riscv,vendorid missing severity to "info"
- Separate vendor extension list to vendor files
- xtheadvector no longer puts v in the elf_hwcap
- Only patch vendor extension if all harts are associated with the same
  vendor. This is the best chance the kernel has for working properly if
  there are multiple vendors.
- Split hwprobe vendor keys out into vendor file
- Add attribution for Heiko's patches
- Link to v1: 
https://lore.kernel.org/r/20240411-dev-charlie-support_thead_vector_6_9-v1-0-4af9815ec...@rivosinc.com

---
Charlie Jenkins (16):
  riscv: cpufeature: Fix thead vector hwcap removal
  dt-bindings: riscv: Add xtheadvector ISA extension description
  dt-bindings: riscv: Add vendorid
  riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
  riscv: Fix extension subset checking
  riscv: Extend cpufeature.c to detect vendor extensions
  riscv: Introduce vendor variants of extension helpers
  riscv: drivers: Convert xandespmu to use the vendor extension framework
  riscv: uaccess: Add alternative for xtheadvector uaccess
  riscv: csr: Add CSR encodings for VCSR_VXRM/VCSR_VXSAT
  riscv: Create xtheadvector file
  riscv: vector: Support xtheadvector save/restore
  riscv: hwprobe: Add vendor extension probing
  riscv: hwprobe: Document vendor extensions and xtheadvector extension
  selftests: riscv: Fix vector tests
  selftests: riscv: Support xtheadvector in vector tests

Heiko Stuebner (1):
  RISC-V: define the elements of the VCSR vector CSR

 Documentation/arch/riscv/hwprobe.rst   |  12 +
 Documentation/devicetree/bindings/riscv/cpus.yaml  |   5 +
 .../devicetree/bindings/riscv/extensions.yaml  |  10 +
 arch/riscv/Kconfig |   2 +
 arch/riscv/Kconfig.vendor  |  11 +
 arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi  |   3 +-
 arch/riscv/errata/sifive/errata.c  |   2 +
 arch/riscv/errata/thead/errata.c   |   2 +
 arch/riscv/include/asm/cpufeature.h| 170 +---
 arch/riscv/include/asm/csr.h   |  13 +
 arch/riscv/include/asm/hwcap.h |  27 +-

Re: [PATCH 02/19] riscv: cpufeature: Fix thead vector hwcap removal

2024-04-15 Thread Charlie Jenkins

On Sat, Apr 13, 2024 at 12:40:26AM +0100, Conor Dooley wrote:
> On Fri, Apr 12, 2024 at 02:31:42PM -0700, Charlie Jenkins wrote:
> > On Fri, Apr 12, 2024 at 10:27:47PM +0100, Conor Dooley wrote:
> > > On Fri, Apr 12, 2024 at 01:48:46PM -0700, Charlie Jenkins wrote:
> > > > On Fri, Apr 12, 2024 at 07:47:48PM +0100, Conor Dooley wrote:
> > > > > On Fri, Apr 12, 2024 at 10:12:20AM -0700, Charlie Jenkins wrote:
> 
> > > > > > This is already falling back on the boot CPU, but that is not a 
> > > > > > solution
> > > > > > that scales. Even though all systems currently have homogenous
> > > > > > marchid/mvendorid I am hesitant to assert that all systems are
> > > > > > homogenous without providing an option to override this.
> > > > > 
> > > > > There are already is an option. Use the non-deprecated property in 
> > > > > your
> > > > > new system for describing what extesions you support. We don't need to
> > > > > add any more properties (for now at least).
> > > > 
> > > > The issue is that it is not possible to know which vendor extensions are
> > > > associated with a vendor. That requires a global namespace where each
> > > > extension can be looked up in a table. I have opted to have a
> > > > vendor-specific namespace so that vendors don't have to worry about
> > > > stepping on other vendor's toes (or the other way around). In order to
> > > > support that, the vendorid of the hart needs to be known prior.
> > > 
> > > Nah, I think you're mixing up something like hwprobe and having
> > > namespaces there with needing namespacing on the devicetree probing side
> > > too. You don't need any vendor namespacing, it's perfectly fine (IMO)
> > > for a vendor to implement someone else's extension and I think we should
> > > allow probing any vendors extension on any CPU.
> > 
> > I am not mixing it up. Sure a vendor can implement somebody else's
> > extension, they just need to add it to their namespace too.
> 
> I didn't mean that you were mixing up how your implementation worked, my
> point was that you're mixing up the hwprobe stuff which may need
> namespacing for $a{b,p}i_reason and probing from DT which does not.
> I don't think that the kernel should need to be changed at all if
> someone shows up and implements another vendor's extension - we already
> have far too many kernel changes required to display support for
> extensions and I don't welcome potential for more.

Yes I understand where you are coming from. We do not want it to require
very many changes to add an extension. With this framework, there are
the same number of changes to add a vendor extension as there is to add
a standard extension. There is the upfront cost of creating the struct
for the first vendor extension from a vendor, but after that the
extension only needs to be added to the associated vendor's file (I am
extracting this out to a vendor file in the next version). This is also
a very easy task since the fields from a different vendor can be copied
and adapted.

> 
> Another thing I just thought of was systems where the SoC vendor
> implements some extension that gets communicated in the ISA string but
> is not the vendor in mvendorid in their various CPUs. I wouldn't want to
> see several different entries in structs (or several different hwprobe
> keys, but that's another story) for this situation because you're only
> allowing probing what's in the struct matching the vendorid.

Since the isa string is a per-hart field, the vendor associated with the
hart will be used.

- Charlie

Re: [PATCH v4 27/27] docs: ntsync: Add documentation for the ntsync uAPI.

2024-04-15 Thread Randy Dunlap




On 4/15/24 6:08 PM, Elizabeth Figura wrote:
> Add an overall explanation of the driver architecture, and complete and 
> precise
> specification for its intended behaviour.
> 
> Reviewed-by: Bagas Sanjaya 
> Signed-off-by: Elizabeth Figura 

Tested-by: Randy Dunlap 

Thanks.

> ---
>  Documentation/userspace-api/index.rst  |   1 +
>  Documentation/userspace-api/ntsync.rst | 399 +
>  2 files changed, 400 insertions(+)
>  create mode 100644 Documentation/userspace-api/ntsync.rst

-- 
#Randy
https://people.kernel.org/tglx/notes-about-netiquette
https://subspace.kernel.org/etiquette.html

[PATCH v4 27/27] docs: ntsync: Add documentation for the ntsync uAPI.

2024-04-15 Thread Elizabeth Figura

Add an overall explanation of the driver architecture, and complete and precise
specification for its intended behaviour.

Reviewed-by: Bagas Sanjaya 
Signed-off-by: Elizabeth Figura 
---
 Documentation/userspace-api/index.rst  |   1 +
 Documentation/userspace-api/ntsync.rst | 399 +
 2 files changed, 400 insertions(+)
 create mode 100644 Documentation/userspace-api/ntsync.rst

diff --git a/Documentation/userspace-api/index.rst 
b/Documentation/userspace-api/index.rst
index afecfe3cc4a8..d5745a500fa7 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -62,6 +62,7 @@ Everything else
vduse
futex2
perf_ring_buffer
+   ntsync
 
 .. only::  subproject and html
 
diff --git a/Documentation/userspace-api/ntsync.rst 
b/Documentation/userspace-api/ntsync.rst
new file mode 100644
index ..202c2350d3af
--- /dev/null
+++ b/Documentation/userspace-api/ntsync.rst
@@ -0,0 +1,399 @@
+===
+NT synchronization primitive driver
+===
+
+This page documents the user-space API for the ntsync driver.
+
+ntsync is a support driver for emulation of NT synchronization
+primitives by user-space NT emulators. It exists because implementation
+in user-space, using existing tools, cannot match Windows performance
+while offering accurate semantics. It is implemented entirely in
+software, and does not drive any hardware device.
+
+This interface is meant as a compatibility tool only, and should not
+be used for general synchronization. Instead use generic, versatile
+interfaces such as futex(2) and poll(2).
+
+Synchronization primitives
+==
+
+The ntsync driver exposes three types of synchronization primitives:
+semaphores, mutexes, and events.
+
+A semaphore holds a single volatile 32-bit counter, and a static 32-bit
+integer denoting the maximum value. It is considered signaled when the
+counter is nonzero. The counter is decremented by one when a wait is
+satisfied. Both the initial and maximum count are established when the
+semaphore is created.
+
+A mutex holds a volatile 32-bit recursion count, and a volatile 32-bit
+identifier denoting its owner. A mutex is considered signaled when its
+owner is zero (indicating that it is not owned). The recursion count is
+incremented when a wait is satisfied, and ownership is set to the given
+identifier.
+
+A mutex also holds an internal flag denoting whether its previous owner
+has died; such a mutex is said to be abandoned. Owner death is not
+tracked automatically based on thread death, but rather must be
+communicated using ``NTSYNC_IOC_MUTEX_KILL``. An abandoned mutex is
+inherently considered unowned.
+
+Except for the "unowned" semantics of zero, the actual value of the
+owner identifier is not interpreted by the ntsync driver at all. The
+intended use is to store a thread identifier; however, the ntsync
+driver does not actually validate that a calling thread provides
+consistent or unique identifiers.
+
+An event holds a volatile boolean state denoting whether it is signaled
+or not. There are two types of events, auto-reset and manual-reset. An
+auto-reset event is designaled when a wait is satisfied; a manual-reset
+event is not. The event type is specified when the event is created.
+
+Unless specified otherwise, all operations on an object are atomic and
+totally ordered with respect to other operations on the same object.
+
+Objects are represented by files. When all file descriptors to an
+object are closed, that object is deleted.
+
+Char device
+===
+
+The ntsync driver creates a single char device /dev/ntsync. Each file
+description opened on the device represents a unique instance intended
+to back an individual NT virtual machine. Objects created by one ntsync
+instance may only be used with other objects created by the same
+instance.
+
+ioctl reference
+===
+
+All operations on the device are done through ioctls. There are four
+structures used in ioctl calls::
+
+   struct ntsync_sem_args {
+   __u32 sem;
+   __u32 count;
+   __u32 max;
+   };
+
+   struct ntsync_mutex_args {
+   __u32 mutex;
+   __u32 owner;
+   __u32 count;
+   };
+
+   struct ntsync_event_args {
+   __u32 event;
+   __u32 signaled;
+   __u32 manual;
+   };
+
+   struct ntsync_wait_args {
+   __u64 timeout;
+   __u64 objs;
+   __u32 count;
+   __u32 owner;
+   __u32 index;
+   __u32 alert;
+   __u32 flags;
+   __u32 pad;
+   };
+
+Depending on the ioctl, members of the structure may be used as input,
+output, or not at all. All ioctls return 0 on success.
+
+The ioctls on the device file are as follows:
+
+.. c:macro:: NTSYNC_IOC_CREATE_SEM
+
+  Create a semaphore object. Takes a pointer to struct
+  :c:type:`ntsync_sem_args`, which is used as follows:
+
+  .. list-table::
+
+ * - ``sem``
+   - On output, contains a file

[PATCH v4 20/27] selftests: ntsync: Add some tests for manual-reset event state.

2024-04-15 Thread Elizabeth Figura

Test event-specific ioctls NTSYNC_IOC_EVENT_SET, NTSYNC_IOC_EVENT_RESET,
NTSYNC_IOC_EVENT_PULSE, NTSYNC_IOC_EVENT_READ for manual-reset events, and
waiting on manual-reset events.

Signed-off-by: Elizabeth Figura 
---
 .../testing/selftests/drivers/ntsync/ntsync.c | 89 +++
 1 file changed, 89 insertions(+)

diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
index b77fb0b2c4b1..b6481c2b85cc 100644
--- a/tools/testing/selftests/drivers/ntsync/ntsync.c
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -73,6 +73,27 @@ static int unlock_mutex(int mutex, __u32 owner, __u32 *count)
return ret;
 }
 
+static int read_event_state(int event, __u32 *signaled, __u32 *manual)
+{
+   struct ntsync_event_args args;
+   int ret;
+
+   memset(, 0xcc, sizeof(args));
+   ret = ioctl(event, NTSYNC_IOC_EVENT_READ, );
+   *signaled = args.signaled;
+   *manual = args.manual;
+   return ret;
+}
+
+#define check_event_state(event, signaled, manual) \
+   ({ \
+   __u32 __signaled, __manual; \
+   int ret = read_event_state((event), &__signaled, &__manual); \
+   EXPECT_EQ(0, ret); \
+   EXPECT_EQ((signaled), __signaled); \
+   EXPECT_EQ((manual), __manual); \
+   })
+
 static int wait_objs(int fd, unsigned long request, __u32 count,
 const int *objs, __u32 owner, __u32 *index)
 {
@@ -353,6 +374,74 @@ TEST(mutex_state)
close(fd);
 }
 
+TEST(manual_event_state)
+{
+   struct ntsync_event_args event_args;
+   __u32 index, signaled;
+   int fd, event, ret;
+
+   fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
+   ASSERT_LE(0, fd);
+
+   event_args.manual = 1;
+   event_args.signaled = 0;
+   event_args.event = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_EVENT, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, event_args.event);
+   event = event_args.event;
+   check_event_state(event, 0, 1);
+
+   signaled = 0xdeadbeef;
+   ret = ioctl(event, NTSYNC_IOC_EVENT_SET, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, signaled);
+   check_event_state(event, 1, 1);
+
+   ret = ioctl(event, NTSYNC_IOC_EVENT_SET, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, signaled);
+   check_event_state(event, 1, 1);
+
+   ret = wait_any(fd, 1, , 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_event_state(event, 1, 1);
+
+   signaled = 0xdeadbeef;
+   ret = ioctl(event, NTSYNC_IOC_EVENT_RESET, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, signaled);
+   check_event_state(event, 0, 1);
+
+   ret = ioctl(event, NTSYNC_IOC_EVENT_RESET, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, signaled);
+   check_event_state(event, 0, 1);
+
+   ret = wait_any(fd, 1, , 123, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(ETIMEDOUT, errno);
+
+   ret = ioctl(event, NTSYNC_IOC_EVENT_SET, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, signaled);
+
+   ret = ioctl(event, NTSYNC_IOC_EVENT_PULSE, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, signaled);
+   check_event_state(event, 0, 1);
+
+   ret = ioctl(event, NTSYNC_IOC_EVENT_PULSE, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, signaled);
+   check_event_state(event, 0, 1);
+
+   close(event);
+
+   close(fd);
+}
+
 TEST(test_wait_any)
 {
int objs[NTSYNC_MAX_WAIT_COUNT + 1], fd, ret;
-- 
2.43.0

[PATCH v4 26/27] maintainers: Add an entry for ntsync.

2024-04-15 Thread Elizabeth Figura

Add myself as maintainer, supported by CodeWeavers.

Signed-off-by: Elizabeth Figura 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 41a013dfebbc..09ae011a8d91 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15728,6 +15728,15 @@ T: git 
https://github.com/Paragon-Software-Group/linux-ntfs3.git
 F: Documentation/filesystems/ntfs3.rst
 F: fs/ntfs3/
 
+NTSYNC SYNCHRONIZATION PRIMITIVE DRIVER
+M: Elizabeth Figura 
+L: wine-de...@winehq.org
+S: Supported
+F: Documentation/userspace-api/ntsync.rst
+F: drivers/misc/ntsync.c
+F: include/uapi/linux/ntsync.h
+F: tools/testing/selftests/drivers/ntsync/
+
 NUBUS SUBSYSTEM
 M: Finn Thain 
 L: linux-m...@lists.linux-m68k.org
-- 
2.43.0

[PATCH v4 23/27] selftests: ntsync: Add tests for alertable waits.

2024-04-15 Thread Elizabeth Figura

Test the "alert" functionality of NTSYNC_IOC_WAIT_ALL and NTSYNC_IOC_WAIT_ANY,
when a wait is woken with an alert and when it is woken by an object.

Signed-off-by: Elizabeth Figura 
---
 .../testing/selftests/drivers/ntsync/ntsync.c | 179 +-
 1 file changed, 176 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
index 5d17eff6a370..5465a16d38b3 100644
--- a/tools/testing/selftests/drivers/ntsync/ntsync.c
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -95,7 +95,7 @@ static int read_event_state(int event, __u32 *signaled, __u32 
*manual)
})
 
 static int wait_objs(int fd, unsigned long request, __u32 count,
-const int *objs, __u32 owner, __u32 *index)
+const int *objs, __u32 owner, int alert, __u32 *index)
 {
struct ntsync_wait_args args = {0};
struct timespec timeout;
@@ -108,6 +108,7 @@ static int wait_objs(int fd, unsigned long request, __u32 
count,
args.objs = (uintptr_t)objs;
args.owner = owner;
args.index = 0xdeadbeef;
+   args.alert = alert;
ret = ioctl(fd, request, );
*index = args.index;
return ret;
@@ -115,12 +116,26 @@ static int wait_objs(int fd, unsigned long request, __u32 
count,
 
 static int wait_any(int fd, __u32 count, const int *objs, __u32 owner, __u32 
*index)
 {
-   return wait_objs(fd, NTSYNC_IOC_WAIT_ANY, count, objs, owner, index);
+   return wait_objs(fd, NTSYNC_IOC_WAIT_ANY, count, objs, owner, 0, index);
 }
 
 static int wait_all(int fd, __u32 count, const int *objs, __u32 owner, __u32 
*index)
 {
-   return wait_objs(fd, NTSYNC_IOC_WAIT_ALL, count, objs, owner, index);
+   return wait_objs(fd, NTSYNC_IOC_WAIT_ALL, count, objs, owner, 0, index);
+}
+
+static int wait_any_alert(int fd, __u32 count, const int *objs,
+ __u32 owner, int alert, __u32 *index)
+{
+   return wait_objs(fd, NTSYNC_IOC_WAIT_ANY,
+count, objs, owner, alert, index);
+}
+
+static int wait_all_alert(int fd, __u32 count, const int *objs,
+ __u32 owner, int alert, __u32 *index)
+{
+   return wait_objs(fd, NTSYNC_IOC_WAIT_ALL,
+count, objs, owner, alert, index);
 }
 
 TEST(semaphore_state)
@@ -1095,4 +1110,162 @@ TEST(wake_all)
close(fd);
 }
 
+TEST(alert_any)
+{
+   struct ntsync_event_args event_args = {0};
+   struct ntsync_sem_args sem_args = {0};
+   __u32 index, count, signaled;
+   int objs[2], fd, ret;
+
+   fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
+   ASSERT_LE(0, fd);
+
+   sem_args.count = 0;
+   sem_args.max = 2;
+   sem_args.sem = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_SEM, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, sem_args.sem);
+   objs[0] = sem_args.sem;
+
+   sem_args.count = 1;
+   sem_args.max = 2;
+   sem_args.sem = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_SEM, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, sem_args.sem);
+   objs[1] = sem_args.sem;
+
+   event_args.manual = true;
+   event_args.signaled = true;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_EVENT, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_any_alert(fd, 0, NULL, 123, event_args.event, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+
+   ret = ioctl(event_args.event, NTSYNC_IOC_EVENT_RESET, );
+   EXPECT_EQ(0, ret);
+
+   ret = wait_any_alert(fd, 0, NULL, 123, event_args.event, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(ETIMEDOUT, errno);
+
+   ret = ioctl(event_args.event, NTSYNC_IOC_EVENT_SET, );
+   EXPECT_EQ(0, ret);
+
+   ret = wait_any_alert(fd, 2, objs, 123, event_args.event, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, index);
+
+   ret = wait_any_alert(fd, 2, objs, 123, event_args.event, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(2, index);
+
+   close(event_args.event);
+
+   /* test with an auto-reset event */
+
+   event_args.manual = false;
+   event_args.signaled = true;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_EVENT, _args);
+   EXPECT_EQ(0, ret);
+
+   count = 1;
+   ret = post_sem(objs[0], );
+   EXPECT_EQ(0, ret);
+
+   ret = wait_any_alert(fd, 2, objs, 123, event_args.event, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+
+   ret = wait_any_alert(fd, 2, objs, 123, event_args.event, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(2, index);
+
+   ret = wait_any_alert(fd, 2, objs, 123, event_args.event, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(ETIMEDOUT, errno);
+
+   close(event_args.event);
+
+   close(objs[0]);
+   close(objs[1]);
+
+   close(fd);
+}
+
+TEST(alert_all)
+{
+   struct ntsync_event_args event_args = {0};
+   struct ntsync_sem_args sem_args = {0};

[PATCH v4 00/30] NT synchronization primitive driver

2024-04-15 Thread Elizabeth Figura

This patch series implements a new char misc driver, /dev/ntsync, which is used
to implement Windows NT synchronization primitives.

NT synchronization primitives are unique in that the wait functions both are
vectored, operate on multiple types of object with different behaviour (mutex,
semaphore, event), and affect the state of the objects they wait on. This model
is not compatible with existing kernel synchronization objects or interfaces,
and therefore the ntsync driver implements its own wait queues and locking.

Hence I would like to request review from someone familiar with locking to make
sure that the usage of low-level kernel primitives is correct and that the wait
queues work as intended, and to that end I've CC'd the locking maintainers.

== Background ==

The Wine project emulates the Windows API in user space. One particular part of
that API, namely the NT synchronization primitives, have historically been
implemented via RPC to a dedicated "kernel" process. However, more recent
applications use these APIs more strenuously, and the overhead of RPC has become
a bottleneck.

The NT synchronization APIs are too complex to implement on top of existing
primitives without sacrificing correctness. Certain operations, such as
NtPulseEvent() or the "wait-for-all" mode of NtWaitForMultipleObjects(), require
direct control over the underlying wait queue, and implementing a wait queue
sufficiently robust for Wine in user space is not possible. This proposed
driver, therefore, implements the problematic interfaces directly in the Linux
kernel.

This driver was presented at Linux Plumbers Conference 2023. For those further
interested in the history of synchronization in Wine and past attempts to solve
this problem in user space, a recording of the presentation can be viewed here:

https://www.youtube.com/watch?v=NjU4nyWyhU8


== Performance ==

The gain in performance varies wildly depending on the application in question
and the user's hardware. For some games NT synchronization is not a bottleneck
and no change can be observed, but for others frame rate improvements of 50 to
150 percent are not atypical. The following table lists frame rate measurements
from a variety of games on a variety of hardware, taken by users Dmitry
Skvortsov, FuzzyQuils, OnMars, and myself:

GameUpstreamntsync  improvement
===
Anger Foot   69  99  43%
Call of Juarez   99.8   224.1   125%
Dirt 3  110.6   860.7   678%
Forza Horizon 5 108 160  48%
Lara Croft: Temple of Osiris141 326 131%
Metro 2033  164.4   199.221%
Resident Evil 2  26  77 196%
The Crew 26  51  96%
Tiny Tina's Wonderlands 130 360 177%
Total War Saga: Troy109 146  34%
===


== Patches ==

The intended semantics of the patches are broadly intended to match those of the
corresponding Windows functions. For those not already familiar with the Windows
functions (or their undocumented behaviour), patch 27/27 provides a detailed
specification, and individual patches also include a brief description of the
API they are implementing.

The patches making use of this driver in Wine can be retrieved or browsed here:

https://repo.or.cz/wine/zf.git/shortlog/refs/heads/ntsync5


== Implementation ==

Some aspects of the implementation may deserve particular comment:

* In the interest of performance, each object is governed only by a single
  spinlock. However, NTSYNC_IOC_WAIT_ALL requires that the state of multiple
  objects be changed as a single atomic operation. In order to achieve this, we
  first take a device-wide lock ("wait_all_lock") any time we are going to lock
  more than one object at a time.

  The maximum number of objects that can be used in a vectored wait, and
  therefore the maximum that can be locked simultaneously, is 64. This number is
  NT's own limit.

  The acquisition of multiple spinlocks will degrade performance. This is a
  conscious choice, however. Wait-for-all is known to be a very rare operation
  in practice, especially with counts that approach the maximum, and it is the
  intent of the ntsync driver to optimize wait-for-any at the expense of
  wait-for-all as much as possible.

* NT mutexes are tied to their threads on an OS level, and the kernel includes
  builtin support for "robust" mutexes. In order to keep the ntsync driver
  self-contained and avoid touching more code than necessary, it does not hook
  into task exit nor use pids.

  Instead, the user space emulator is

[PATCH v4 24/27] selftests: ntsync: Add some tests for wakeup signaling via alerts.

2024-04-15 Thread Elizabeth Figura

Expand the alert tests to cover alerting a thread mid-wait, to test that the
relevant scheduling logic works correctly.

Signed-off-by: Elizabeth Figura 
---
 .../testing/selftests/drivers/ntsync/ntsync.c | 62 +++
 1 file changed, 62 insertions(+)

diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
index 5465a16d38b3..968874d7e325 100644
--- a/tools/testing/selftests/drivers/ntsync/ntsync.c
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -1113,9 +1113,12 @@ TEST(wake_all)
 TEST(alert_any)
 {
struct ntsync_event_args event_args = {0};
+   struct ntsync_wait_args wait_args = {0};
struct ntsync_sem_args sem_args = {0};
__u32 index, count, signaled;
+   struct wait_args thread_args;
int objs[2], fd, ret;
+   pthread_t thread;
 
fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
ASSERT_LE(0, fd);
@@ -1163,6 +1166,34 @@ TEST(alert_any)
EXPECT_EQ(0, ret);
EXPECT_EQ(2, index);
 
+   /* test wakeup via alert */
+
+   ret = ioctl(event_args.event, NTSYNC_IOC_EVENT_RESET, );
+   EXPECT_EQ(0, ret);
+
+   wait_args.timeout = get_abs_timeout(1000);
+   wait_args.objs = (uintptr_t)objs;
+   wait_args.count = 2;
+   wait_args.owner = 123;
+   wait_args.index = 0xdeadbeef;
+   wait_args.alert = event_args.event;
+   thread_args.fd = fd;
+   thread_args.args = _args;
+   thread_args.request = NTSYNC_IOC_WAIT_ANY;
+   ret = pthread_create(, NULL, wait_thread, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(ETIMEDOUT, ret);
+
+   ret = ioctl(event_args.event, NTSYNC_IOC_EVENT_SET, );
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, thread_args.ret);
+   EXPECT_EQ(2, wait_args.index);
+
close(event_args.event);
 
/* test with an auto-reset event */
@@ -1199,9 +1230,12 @@ TEST(alert_any)
 TEST(alert_all)
 {
struct ntsync_event_args event_args = {0};
+   struct ntsync_wait_args wait_args = {0};
struct ntsync_sem_args sem_args = {0};
+   struct wait_args thread_args;
__u32 index, count, signaled;
int objs[2], fd, ret;
+   pthread_t thread;
 
fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
ASSERT_LE(0, fd);
@@ -1235,6 +1269,34 @@ TEST(alert_all)
EXPECT_EQ(0, ret);
EXPECT_EQ(2, index);
 
+   /* test wakeup via alert */
+
+   ret = ioctl(event_args.event, NTSYNC_IOC_EVENT_RESET, );
+   EXPECT_EQ(0, ret);
+
+   wait_args.timeout = get_abs_timeout(1000);
+   wait_args.objs = (uintptr_t)objs;
+   wait_args.count = 2;
+   wait_args.owner = 123;
+   wait_args.index = 0xdeadbeef;
+   wait_args.alert = event_args.event;
+   thread_args.fd = fd;
+   thread_args.args = _args;
+   thread_args.request = NTSYNC_IOC_WAIT_ALL;
+   ret = pthread_create(, NULL, wait_thread, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(ETIMEDOUT, ret);
+
+   ret = ioctl(event_args.event, NTSYNC_IOC_EVENT_SET, );
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, thread_args.ret);
+   EXPECT_EQ(2, wait_args.index);
+
close(event_args.event);
 
/* test with an auto-reset event */
-- 
2.43.0

[PATCH v4 09/27] ntsync: Introduce NTSYNC_IOC_EVENT_PULSE.

2024-04-15 Thread Elizabeth Figura

This corresponds to the NT syscall NtPulseEvent().

This wakes up any waiters as if the event had been set, but does not set the
event, instead resetting it if it had been signalled. Thus, for a manual-reset
event, all waiters are woken, whereas for an auto-reset event, at most one
waiter is woken.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 10 --
 include/uapi/linux/ntsync.h |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index ae78425c87d1..adba4657bf26 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -473,7 +473,7 @@ static int ntsync_mutex_kill(struct ntsync_obj *mutex, void 
__user *argp)
return ret;
 }
 
-static int ntsync_event_set(struct ntsync_obj *event, void __user *argp)
+static int ntsync_event_set(struct ntsync_obj *event, void __user *argp, bool 
pulse)
 {
struct ntsync_device *dev = event->dev;
__u32 prev_state;
@@ -489,6 +489,8 @@ static int ntsync_event_set(struct ntsync_obj *event, void 
__user *argp)
event->u.event.signaled = true;
try_wake_all_obj(dev, event);
try_wake_any_event(event);
+   if (pulse)
+   event->u.event.signaled = false;
 
spin_unlock(>lock);
spin_unlock(>wait_all_lock);
@@ -498,6 +500,8 @@ static int ntsync_event_set(struct ntsync_obj *event, void 
__user *argp)
prev_state = event->u.event.signaled;
event->u.event.signaled = true;
try_wake_any_event(event);
+   if (pulse)
+   event->u.event.signaled = false;
 
spin_unlock(>lock);
}
@@ -552,9 +556,11 @@ static long ntsync_obj_ioctl(struct file *file, unsigned 
int cmd,
case NTSYNC_IOC_MUTEX_KILL:
return ntsync_mutex_kill(obj, argp);
case NTSYNC_IOC_EVENT_SET:
-   return ntsync_event_set(obj, argp);
+   return ntsync_event_set(obj, argp, false);
case NTSYNC_IOC_EVENT_RESET:
return ntsync_event_reset(obj, argp);
+   case NTSYNC_IOC_EVENT_PULSE:
+   return ntsync_event_set(obj, argp, true);
default:
return -ENOIOCTLCMD;
}
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index 657542107328..57721f5d31ba 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -54,5 +54,6 @@ struct ntsync_wait_args {
 #define NTSYNC_IOC_MUTEX_KILL  _IOW ('N', 0x86, __u32)
 #define NTSYNC_IOC_EVENT_SET   _IOR ('N', 0x88, __u32)
 #define NTSYNC_IOC_EVENT_RESET _IOR ('N', 0x89, __u32)
+#define NTSYNC_IOC_EVENT_PULSE _IOR ('N', 0x8a, __u32)
 
 #endif
-- 
2.43.0

[PATCH v4 25/27] selftests: ntsync: Add a stress test for contended waits.

2024-04-15 Thread Elizabeth Figura

Test a more realistic usage pattern, and one with heavy contention, in order to
actually exercise ntsync's internal synchronization.

This test has several threads in a tight loop acquiring a mutex, modifying some
shared data, and then releasing the mutex. At the end we check if the data is
consistent.

Signed-off-by: Elizabeth Figura 
---
 .../testing/selftests/drivers/ntsync/ntsync.c | 74 +++
 1 file changed, 74 insertions(+)

diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
index 968874d7e325..5fa2c9a0768c 100644
--- a/tools/testing/selftests/drivers/ntsync/ntsync.c
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -1330,4 +1330,78 @@ TEST(alert_all)
close(fd);
 }
 
+#define STRESS_LOOPS 1
+#define STRESS_THREADS 4
+
+static unsigned int stress_counter;
+static int stress_device, stress_start_event, stress_mutex;
+
+static void *stress_thread(void *arg)
+{
+   struct ntsync_wait_args wait_args = {0};
+   __u32 index, count, i;
+   int ret;
+
+   wait_args.timeout = UINT64_MAX;
+   wait_args.count = 1;
+   wait_args.objs = (uintptr_t)_start_event;
+   wait_args.owner = gettid();
+   wait_args.index = 0xdeadbeef;
+
+   ioctl(stress_device, NTSYNC_IOC_WAIT_ANY, _args);
+
+   wait_args.objs = (uintptr_t)_mutex;
+
+   for (i = 0; i < STRESS_LOOPS; ++i) {
+   ioctl(stress_device, NTSYNC_IOC_WAIT_ANY, _args);
+
+   ++stress_counter;
+
+   unlock_mutex(stress_mutex, wait_args.owner, );
+   }
+
+   return NULL;
+}
+
+TEST(stress_wait)
+{
+   struct ntsync_event_args event_args;
+   struct ntsync_mutex_args mutex_args;
+   pthread_t threads[STRESS_THREADS];
+   __u32 signaled, i;
+   int ret;
+
+   stress_device = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
+   ASSERT_LE(0, stress_device);
+
+   mutex_args.owner = 0;
+   mutex_args.count = 0;
+   ret = ioctl(stress_device, NTSYNC_IOC_CREATE_MUTEX, _args);
+   EXPECT_EQ(0, ret);
+   stress_mutex = mutex_args.mutex;
+
+   event_args.manual = 1;
+   event_args.signaled = 0;
+   ret = ioctl(stress_device, NTSYNC_IOC_CREATE_EVENT, _args);
+   EXPECT_EQ(0, ret);
+   stress_start_event = event_args.event;
+
+   for (i = 0; i < STRESS_THREADS; ++i)
+   pthread_create([i], NULL, stress_thread, NULL);
+
+   ret = ioctl(stress_start_event, NTSYNC_IOC_EVENT_SET, );
+   EXPECT_EQ(0, ret);
+
+   for (i = 0; i < STRESS_THREADS; ++i) {
+   ret = pthread_join(threads[i], NULL);
+   EXPECT_EQ(0, ret);
+   }
+
+   EXPECT_EQ(STRESS_LOOPS * STRESS_THREADS, stress_counter);
+
+   close(stress_start_event);
+   close(stress_mutex);
+   close(stress_device);
+}
+
 TEST_HARNESS_MAIN
-- 
2.43.0

[PATCH v4 22/27] selftests: ntsync: Add some tests for wakeup signaling with events.

2024-04-15 Thread Elizabeth Figura

Expand the contended wait tests, which previously only covered events and
semaphores, to cover events as well.

Signed-off-by: Elizabeth Figura 
---
 .../testing/selftests/drivers/ntsync/ntsync.c | 151 +-
 1 file changed, 147 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
index 12ccb4ec28e4..5d17eff6a370 100644
--- a/tools/testing/selftests/drivers/ntsync/ntsync.c
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -622,6 +622,7 @@ TEST(test_wait_any)
 
 TEST(test_wait_all)
 {
+   struct ntsync_event_args event_args = {0};
struct ntsync_mutex_args mutex_args = {0};
struct ntsync_sem_args sem_args = {0};
__u32 owner, index, count;
@@ -644,6 +645,11 @@ TEST(test_wait_all)
EXPECT_EQ(0, ret);
EXPECT_NE(0xdeadbeef, mutex_args.mutex);
 
+   event_args.manual = true;
+   event_args.signaled = true;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_EVENT, _args);
+   EXPECT_EQ(0, ret);
+
objs[0] = sem_args.sem;
objs[1] = mutex_args.mutex;
 
@@ -692,6 +698,14 @@ TEST(test_wait_all)
check_sem_state(sem_args.sem, 1, 3);
check_mutex_state(mutex_args.mutex, 1, 123);
 
+   objs[0] = sem_args.sem;
+   objs[1] = event_args.event;
+   ret = wait_all(fd, 2, objs, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_sem_state(sem_args.sem, 0, 3);
+   check_event_state(event_args.event, 1, 1);
+
/* test waiting on the same object twice */
objs[0] = objs[1] = sem_args.sem;
ret = wait_all(fd, 2, objs, 123, );
@@ -700,6 +714,7 @@ TEST(test_wait_all)
 
close(sem_args.sem);
close(mutex_args.mutex);
+   close(event_args.event);
 
close(fd);
 }
@@ -746,12 +761,13 @@ static int wait_for_thread(pthread_t thread, unsigned int 
ms)
 
 TEST(wake_any)
 {
+   struct ntsync_event_args event_args = {0};
struct ntsync_mutex_args mutex_args = {0};
struct ntsync_wait_args wait_args = {0};
struct ntsync_sem_args sem_args = {0};
struct wait_args thread_args;
+   __u32 count, index, signaled;
int objs[2], fd, ret;
-   __u32 count, index;
pthread_t thread;
 
fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
@@ -833,10 +849,101 @@ TEST(wake_any)
EXPECT_EQ(0, thread_args.ret);
EXPECT_EQ(1, wait_args.index);
 
+   /* test waking events */
+
+   event_args.manual = false;
+   event_args.signaled = false;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_EVENT, _args);
+   EXPECT_EQ(0, ret);
+
+   objs[1] = event_args.event;
+   wait_args.timeout = get_abs_timeout(1000);
+   ret = pthread_create(, NULL, wait_thread, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(ETIMEDOUT, ret);
+
+   ret = ioctl(event_args.event, NTSYNC_IOC_EVENT_SET, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, signaled);
+   check_event_state(event_args.event, 0, 0);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, thread_args.ret);
+   EXPECT_EQ(1, wait_args.index);
+
+   wait_args.timeout = get_abs_timeout(1000);
+   ret = pthread_create(, NULL, wait_thread, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(ETIMEDOUT, ret);
+
+   ret = ioctl(event_args.event, NTSYNC_IOC_EVENT_PULSE, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, signaled);
+   check_event_state(event_args.event, 0, 0);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, thread_args.ret);
+   EXPECT_EQ(1, wait_args.index);
+
+   close(event_args.event);
+
+   event_args.manual = true;
+   event_args.signaled = false;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_EVENT, _args);
+   EXPECT_EQ(0, ret);
+
+   objs[1] = event_args.event;
+   wait_args.timeout = get_abs_timeout(1000);
+   ret = pthread_create(, NULL, wait_thread, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(ETIMEDOUT, ret);
+
+   ret = ioctl(event_args.event, NTSYNC_IOC_EVENT_SET, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, signaled);
+   check_event_state(event_args.event, 1, 1);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, thread_args.ret);
+   EXPECT_EQ(1, wait_args.index);
+
+   ret = ioctl(event_args.event, NTSYNC_IOC_EVENT_RESET, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, signaled);
+
+   wait_args.timeout = get_abs_timeout(1000);
+   ret = pthread_create(, NULL, wait_thread, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(ETIMEDOUT, ret);
+
+   ret = ioctl(event_args.event, NTSYNC_IOC_EVENT_PULSE, );
+   EXPECT_EQ(0,

[PATCH v4 13/27] ntsync: Introduce alertable waits.

2024-04-15 Thread Elizabeth Figura

NT waits can optionally be made "alertable". This is a special channel for
thread wakeup that is mildly similar to SIGIO. A thread has an internal single
bit of "alerted" state, and if a thread is alerted while an alertable wait, the
wait will return a special value, consume the "alerted" state, and will not
consume any of its objects.

Alerts are implemented using events; the user-space NT emulator is expected to
create an internal ntsync event for each thread and pass that event to wait
functions.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 68 -
 include/uapi/linux/ntsync.h |  2 +-
 2 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index a03c6fceb518..19fd70ac3f50 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -819,25 +819,32 @@ static int setup_wait(struct ntsync_device *dev,
  const struct ntsync_wait_args *args, bool all,
  struct ntsync_q **ret_q)
 {
+   int fds[NTSYNC_MAX_WAIT_COUNT + 1];
const __u32 count = args->count;
-   int fds[NTSYNC_MAX_WAIT_COUNT];
struct ntsync_q *q;
+   __u32 total_count;
__u32 i, j;
 
if (!args->owner)
return -EINVAL;
 
-   if (args->pad || args->pad2 || (args->flags & ~NTSYNC_WAIT_REALTIME))
+   if (args->pad || (args->flags & ~NTSYNC_WAIT_REALTIME))
return -EINVAL;
 
if (args->count > NTSYNC_MAX_WAIT_COUNT)
return -EINVAL;
 
+   total_count = count;
+   if (args->alert)
+   total_count++;
+
if (copy_from_user(fds, u64_to_user_ptr(args->objs),
   array_size(count, sizeof(*fds
return -EFAULT;
+   if (args->alert)
+   fds[count] = args->alert;
 
-   q = kmalloc(struct_size(q, entries, count), GFP_KERNEL);
+   q = kmalloc(struct_size(q, entries, total_count), GFP_KERNEL);
if (!q)
return -ENOMEM;
q->task = current;
@@ -847,7 +854,7 @@ static int setup_wait(struct ntsync_device *dev,
q->ownerdead = false;
q->count = count;
 
-   for (i = 0; i < count; i++) {
+   for (i = 0; i < total_count; i++) {
struct ntsync_q_entry *entry = >entries[i];
struct ntsync_obj *obj = get_obj(dev, fds[i]);
 
@@ -897,9 +904,9 @@ static void try_wake_any_obj(struct ntsync_obj *obj)
 static int ntsync_wait_any(struct ntsync_device *dev, void __user *argp)
 {
struct ntsync_wait_args args;
+   __u32 i, total_count;
struct ntsync_q *q;
int signaled;
-   __u32 i;
int ret;
 
if (copy_from_user(, argp, sizeof(args)))
@@ -909,9 +916,13 @@ static int ntsync_wait_any(struct ntsync_device *dev, void 
__user *argp)
if (ret < 0)
return ret;
 
+   total_count = args.count;
+   if (args.alert)
+   total_count++;
+
/* queue ourselves */
 
-   for (i = 0; i < args.count; i++) {
+   for (i = 0; i < total_count; i++) {
struct ntsync_q_entry *entry = >entries[i];
struct ntsync_obj *obj = entry->obj;
 
@@ -920,9 +931,15 @@ static int ntsync_wait_any(struct ntsync_device *dev, void 
__user *argp)
spin_unlock(>lock);
}
 
-   /* check if we are already signaled */
+   /*
+* Check if we are already signaled.
+*
+* Note that the API requires that normal objects are checked before
+* the alert event. Hence we queue the alert event last, and check
+* objects in order.
+*/
 
-   for (i = 0; i < args.count; i++) {
+   for (i = 0; i < total_count; i++) {
struct ntsync_obj *obj = q->entries[i].obj;
 
if (atomic_read(>signaled) != -1)
@@ -939,7 +956,7 @@ static int ntsync_wait_any(struct ntsync_device *dev, void 
__user *argp)
 
/* and finally, unqueue */
 
-   for (i = 0; i < args.count; i++) {
+   for (i = 0; i < total_count; i++) {
struct ntsync_q_entry *entry = >entries[i];
struct ntsync_obj *obj = entry->obj;
 
@@ -999,6 +1016,14 @@ static int ntsync_wait_all(struct ntsync_device *dev, 
void __user *argp)
 */
list_add_tail(>node, >all_waiters);
}
+   if (args.alert) {
+   struct ntsync_q_entry *entry = >entries[args.count];
+   struct ntsync_obj *obj = entry->obj;
+
+   spin_lock_nest_lock(>lock, >wait_all_lock);
+   list_add_tail(>node, >any_waiters);
+   spin_unlock(>lock);
+   }
 
/* check if we are already signaled */
 
@@ -1006,6 +1031,21 @@ static int ntsync_wait_all(struct ntsync_device *dev, 
void __user *argp)
 
spin_unlock(>wait_all_lock);
 
+   /*
+* Check if the alert event is signaled, making sure to do so only

[PATCH v4 05/27] ntsync: Introduce NTSYNC_IOC_MUTEX_KILL.

2024-04-15 Thread Elizabeth Figura

This does not correspond to any NT syscall. Rather, when a thread dies, it
should be called by the NT emulator for each mutex.

NT mutexes are robust (in the pthread sense). When an NT thread dies, any
mutexes it owned are immediately released. Acquisition of those mutexes by other
threads will return a special value indicating that the mutex was abandoned,
like EOWNERDEAD returned from pthread_mutex_lock(), and EOWNERDEAD is indeed
used here for that purpose.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 71 +++--
 include/uapi/linux/ntsync.h |  1 +
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index f7911ef78d5b..1e68f96bc2a6 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -57,6 +57,7 @@ struct ntsync_obj {
struct {
__u32 count;
__u32 owner;
+   bool ownerdead;
} mutex;
} u;
 
@@ -109,6 +110,7 @@ struct ntsync_q {
atomic_t signaled;
 
bool all;
+   bool ownerdead;
__u32 count;
struct ntsync_q_entry entries[];
 };
@@ -185,6 +187,9 @@ static void try_wake_all(struct ntsync_device *dev, struct 
ntsync_q *q,
obj->u.sem.count--;
break;
case NTSYNC_TYPE_MUTEX:
+   if (obj->u.mutex.ownerdead)
+   q->ownerdead = true;
+   obj->u.mutex.ownerdead = false;
obj->u.mutex.count++;
obj->u.mutex.owner = q->owner;
break;
@@ -246,6 +251,9 @@ static void try_wake_any_mutex(struct ntsync_obj *mutex)
continue;
 
if (atomic_try_cmpxchg(>signaled, , entry->index)) {
+   if (mutex->u.mutex.ownerdead)
+   q->ownerdead = true;
+   mutex->u.mutex.ownerdead = false;
mutex->u.mutex.count++;
mutex->u.mutex.owner = q->owner;
wake_up_process(q->task);
@@ -377,6 +385,62 @@ static int ntsync_mutex_unlock(struct ntsync_obj *mutex, 
void __user *argp)
return ret;
 }
 
+/*
+ * Actually change the mutex state to mark its owner as dead,
+ * returning -EPERM if not the owner.
+ */
+static int kill_mutex_state(struct ntsync_obj *mutex, __u32 owner)
+{
+   lockdep_assert_held(>lock);
+
+   if (mutex->u.mutex.owner != owner)
+   return -EPERM;
+
+   mutex->u.mutex.ownerdead = true;
+   mutex->u.mutex.owner = 0;
+   mutex->u.mutex.count = 0;
+   return 0;
+}
+
+static int ntsync_mutex_kill(struct ntsync_obj *mutex, void __user *argp)
+{
+   struct ntsync_device *dev = mutex->dev;
+   __u32 owner;
+   int ret;
+
+   if (get_user(owner, (__u32 __user *)argp))
+   return -EFAULT;
+   if (!owner)
+   return -EINVAL;
+
+   if (mutex->type != NTSYNC_TYPE_MUTEX)
+   return -EINVAL;
+
+   if (atomic_read(>all_hint) > 0) {
+   spin_lock(>wait_all_lock);
+   spin_lock_nest_lock(>lock, >wait_all_lock);
+
+   ret = kill_mutex_state(mutex, owner);
+   if (!ret) {
+   try_wake_all_obj(dev, mutex);
+   try_wake_any_mutex(mutex);
+   }
+
+   spin_unlock(>lock);
+   spin_unlock(>wait_all_lock);
+   } else {
+   spin_lock(>lock);
+
+   ret = kill_mutex_state(mutex, owner);
+   if (!ret)
+   try_wake_any_mutex(mutex);
+
+   spin_unlock(>lock);
+   }
+
+   return ret;
+}
+
 static int ntsync_obj_release(struct inode *inode, struct file *file)
 {
struct ntsync_obj *obj = file->private_data;
@@ -398,6 +462,8 @@ static long ntsync_obj_ioctl(struct file *file, unsigned 
int cmd,
return ntsync_sem_post(obj, argp);
case NTSYNC_IOC_MUTEX_UNLOCK:
return ntsync_mutex_unlock(obj, argp);
+   case NTSYNC_IOC_MUTEX_KILL:
+   return ntsync_mutex_kill(obj, argp);
default:
return -ENOIOCTLCMD;
}
@@ -592,6 +658,7 @@ static int setup_wait(struct ntsync_device *dev,
q->owner = args->owner;
atomic_set(>signaled, -1);
q->all = all;
+   q->ownerdead = false;
q->count = count;
 
for (i = 0; i < count; i++) {
@@ -699,7 +766,7 @@ static int ntsync_wait_any(struct ntsync_device *dev, void 
__user *argp)
struct ntsync_wait_args __user *user_args = argp;
 
/* even if we caught a signal, we need to communicate success */
-   ret = 0;
+   ret = q->ownerdead ? -EOWNERDEAD :

[PATCH v4 07/27] ntsync: Introduce NTSYNC_IOC_EVENT_SET.

2024-04-15 Thread Elizabeth Figura

This corresponds to the NT syscall NtSetEvent().

This sets the event to the signaled state, and returns its previous state.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 37 +
 include/uapi/linux/ntsync.h |  1 +
 2 files changed, 38 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 3e125c805c00..69f359241cf6 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -473,6 +473,41 @@ static int ntsync_mutex_kill(struct ntsync_obj *mutex, 
void __user *argp)
return ret;
 }
 
+static int ntsync_event_set(struct ntsync_obj *event, void __user *argp)
+{
+   struct ntsync_device *dev = event->dev;
+   __u32 prev_state;
+
+   if (event->type != NTSYNC_TYPE_EVENT)
+   return -EINVAL;
+
+   if (atomic_read(>all_hint) > 0) {
+   spin_lock(>wait_all_lock);
+   spin_lock_nest_lock(>lock, >wait_all_lock);
+
+   prev_state = event->u.event.signaled;
+   event->u.event.signaled = true;
+   try_wake_all_obj(dev, event);
+   try_wake_any_event(event);
+
+   spin_unlock(>lock);
+   spin_unlock(>wait_all_lock);
+   } else {
+   spin_lock(>lock);
+
+   prev_state = event->u.event.signaled;
+   event->u.event.signaled = true;
+   try_wake_any_event(event);
+
+   spin_unlock(>lock);
+   }
+
+   if (put_user(prev_state, (__u32 __user *)argp))
+   return -EFAULT;
+
+   return 0;
+}
+
 static int ntsync_obj_release(struct inode *inode, struct file *file)
 {
struct ntsync_obj *obj = file->private_data;
@@ -496,6 +531,8 @@ static long ntsync_obj_ioctl(struct file *file, unsigned 
int cmd,
return ntsync_mutex_unlock(obj, argp);
case NTSYNC_IOC_MUTEX_KILL:
return ntsync_mutex_kill(obj, argp);
+   case NTSYNC_IOC_EVENT_SET:
+   return ntsync_event_set(obj, argp);
default:
return -ENOIOCTLCMD;
}
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index 0d133f2eaf0b..65329d15a472 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -52,5 +52,6 @@ struct ntsync_wait_args {
 #define NTSYNC_IOC_SEM_POST_IOWR('N', 0x81, __u32)
 #define NTSYNC_IOC_MUTEX_UNLOCK_IOWR('N', 0x85, struct 
ntsync_mutex_args)
 #define NTSYNC_IOC_MUTEX_KILL  _IOW ('N', 0x86, __u32)
+#define NTSYNC_IOC_EVENT_SET   _IOR ('N', 0x88, __u32)
 
 #endif
-- 
2.43.0

[PATCH v4 02/27] ntsync: Introduce NTSYNC_IOC_WAIT_ALL.

2024-04-15 Thread Elizabeth Figura

This is similar to NTSYNC_IOC_WAIT_ANY, but waits until all of the objects are
simultaneously signaled, and then acquires all of them as a single atomic
operation.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 243 ++--
 include/uapi/linux/ntsync.h |   1 +
 2 files changed, 236 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index c6f84a5fc8c0..e914d626465a 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -55,7 +55,34 @@ struct ntsync_obj {
} sem;
} u;
 
+   /*
+* any_waiters is protected by the object lock, but all_waiters is
+* protected by the device wait_all_lock.
+*/
struct list_head any_waiters;
+   struct list_head all_waiters;
+
+   /*
+* Hint describing how many tasks are queued on this object in a
+* wait-all operation.
+*
+* Any time we do a wake, we may need to wake "all" waiters as well as
+* "any" waiters. In order to atomically wake "all" waiters, we must
+* lock all of the objects, and that means grabbing the wait_all_lock
+* below (and, due to lock ordering rules, before locking this object).
+* However, wait-all is a rare operation, and grabbing the wait-all
+* lock for every wake would create unnecessary contention.
+* Therefore we first check whether all_hint is zero, and, if it is,
+* we skip trying to wake "all" waiters.
+*
+* This hint isn't protected by any lock. It might change during the
+* course of a wake, but there's no meaningful race there; it's only a
+* hint.
+*
+* Since wait requests must originate from user-space threads, we're
+* limited here by PID_MAX_LIMIT, so there's no risk of overflow.
+*/
+   atomic_t all_hint;
 };
 
 struct ntsync_q_entry {
@@ -76,14 +103,100 @@ struct ntsync_q {
 */
atomic_t signaled;
 
+   bool all;
__u32 count;
struct ntsync_q_entry entries[];
 };
 
 struct ntsync_device {
+   /*
+* Wait-all operations must atomically grab all objects, and be totally
+* ordered with respect to each other and wait-any operations.
+* If one thread is trying to acquire several objects, another thread
+* cannot touch the object at the same time.
+*
+* We achieve this by grabbing multiple object locks at the same time.
+* However, this creates a lock ordering problem. To solve that problem,
+* wait_all_lock is taken first whenever multiple objects must be locked
+* at the same time.
+*/
+   spinlock_t wait_all_lock;
+
struct file *file;
 };
 
+static bool is_signaled(struct ntsync_obj *obj, __u32 owner)
+{
+   lockdep_assert_held(>lock);
+
+   switch (obj->type) {
+   case NTSYNC_TYPE_SEM:
+   return !!obj->u.sem.count;
+   }
+
+   WARN(1, "bad object type %#x\n", obj->type);
+   return false;
+}
+
+/*
+ * "locked_obj" is an optional pointer to an object which is already locked and
+ * should not be locked again. This is necessary so that changing an object's
+ * state and waking it can be a single atomic operation.
+ */
+static void try_wake_all(struct ntsync_device *dev, struct ntsync_q *q,
+struct ntsync_obj *locked_obj)
+{
+   __u32 count = q->count;
+   bool can_wake = true;
+   int signaled = -1;
+   __u32 i;
+
+   lockdep_assert_held(>wait_all_lock);
+   if (locked_obj)
+   lockdep_assert_held(_obj->lock);
+
+   for (i = 0; i < count; i++) {
+   if (q->entries[i].obj != locked_obj)
+   spin_lock_nest_lock(>entries[i].obj->lock, 
>wait_all_lock);
+   }
+
+   for (i = 0; i < count; i++) {
+   if (!is_signaled(q->entries[i].obj, q->owner)) {
+   can_wake = false;
+   break;
+   }
+   }
+
+   if (can_wake && atomic_try_cmpxchg(>signaled, , 0)) {
+   for (i = 0; i < count; i++) {
+   struct ntsync_obj *obj = q->entries[i].obj;
+
+   switch (obj->type) {
+   case NTSYNC_TYPE_SEM:
+   obj->u.sem.count--;
+   break;
+   }
+   }
+   wake_up_process(q->task);
+   }
+
+   for (i = 0; i < count; i++) {
+   if (q->entries[i].obj != locked_obj)
+   spin_unlock(>entries[i].obj->lock);
+   }
+}
+
+static void try_wake_all_obj(struct ntsync_device *dev, struct ntsync_obj *obj)
+{
+   struct ntsync_q_entry *entry;
+
+   lockdep_assert_held(>wait_all_lock);
+   lockdep_assert_held(>lock);
+
+   list_for_each_entry(entry, >all_waiters, node)
+   try_wake_all(dev,

[PATCH v4 01/27] ntsync: Introduce NTSYNC_IOC_WAIT_ANY.

2024-04-15 Thread Elizabeth Figura

This corresponds to part of the functionality of the NT syscall
NtWaitForMultipleObjects(). Specifically, it implements the behaviour where
the third argument (wait_any) is TRUE, and it does not handle alertable waits.
Those features have been split out into separate patches to ease review.

This patch therefore implements the wait/wake infrastructure which comprises the
core of ntsync's functionality.

NTSYNC_IOC_WAIT_ANY is a vectored wait function similar to poll(). Unlike
poll(), it "consumes" objects when they are signaled. For semaphores, this means
decreasing one from the internal counter. At most one object can be consumed by
this function.

This wait/wake model is fundamentally different from that used anywhere else in
the kernel, and for that reason ntsync does not use any existing infrastructure,
such as futexes, kernel mutexes or semaphores, or wait_event().

Up to 64 objects can be waited on at once. As soon as one is signaled, the
object with the lowest index is consumed, and that index is returned via the
"index" field.

A timeout is supported. The timeout is passed as a u64 nanosecond value, which
represents absolute time measured against either the MONOTONIC or REALTIME clock
(controlled by the flags argument). If U64_MAX is passed, the ioctl waits
indefinitely.

This ioctl validates that all objects belong to the relevant device. This is not
necessary for any technical reason related to NTSYNC_IOC_WAIT_ANY, but will be
necessary for NTSYNC_IOC_WAIT_ALL introduced in the following patch.

Two u32s of padding are left in the ntsync_wait_args structure; one will be used
by a patch later in the series (which is split out to ease review).

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 250 
 include/uapi/linux/ntsync.h |  16 +++
 2 files changed, 266 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 3c2f743c58b0..c6f84a5fc8c0 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -6,11 +6,16 @@
  */
 
 #include 
+#include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -30,6 +35,8 @@ enum ntsync_type {
  *
  * Both rely on struct file for reference counting. Individual
  * ntsync_obj objects take a reference to the device when created.
+ * Wait operations take a reference to each object being waited on for
+ * the duration of the wait.
  */
 
 struct ntsync_obj {
@@ -47,12 +54,56 @@ struct ntsync_obj {
__u32 max;
} sem;
} u;
+
+   struct list_head any_waiters;
+};
+
+struct ntsync_q_entry {
+   struct list_head node;
+   struct ntsync_q *q;
+   struct ntsync_obj *obj;
+   __u32 index;
+};
+
+struct ntsync_q {
+   struct task_struct *task;
+   __u32 owner;
+
+   /*
+* Protected via atomic_try_cmpxchg(). Only the thread that wins the
+* compare-and-swap may actually change object states and wake this
+* task.
+*/
+   atomic_t signaled;
+
+   __u32 count;
+   struct ntsync_q_entry entries[];
 };
 
 struct ntsync_device {
struct file *file;
 };
 
+static void try_wake_any_sem(struct ntsync_obj *sem)
+{
+   struct ntsync_q_entry *entry;
+
+   lockdep_assert_held(>lock);
+
+   list_for_each_entry(entry, >any_waiters, node) {
+   struct ntsync_q *q = entry->q;
+   int signaled = -1;
+
+   if (!sem->u.sem.count)
+   break;
+
+   if (atomic_try_cmpxchg(>signaled, , entry->index)) {
+   sem->u.sem.count--;
+   wake_up_process(q->task);
+   }
+   }
+}
+
 /*
  * Actually change the semaphore state, returning -EOVERFLOW if it is made
  * invalid.
@@ -88,6 +139,8 @@ static int ntsync_sem_post(struct ntsync_obj *sem, void 
__user *argp)
 
prev_count = sem->u.sem.count;
ret = post_sem_state(sem, args);
+   if (!ret)
+   try_wake_any_sem(sem);
 
spin_unlock(>lock);
 
@@ -141,6 +194,7 @@ static struct ntsync_obj *ntsync_alloc_obj(struct 
ntsync_device *dev,
obj->dev = dev;
get_file(dev->file);
spin_lock_init(>lock);
+   INIT_LIST_HEAD(>any_waiters);
 
return obj;
 }
@@ -191,6 +245,200 @@ static int ntsync_create_sem(struct ntsync_device *dev, 
void __user *argp)
return put_user(fd, _args->sem);
 }
 
+static struct ntsync_obj *get_obj(struct ntsync_device *dev, int fd)
+{
+   struct file *file = fget(fd);
+   struct ntsync_obj *obj;
+
+   if (!file)
+   return NULL;
+
+   if (file->f_op != _obj_fops) {
+   fput(file);
+   return NULL;
+   }
+
+   obj = file->private_data;
+   if (obj->dev != dev) {
+   fput(file);
+   return NULL;
+   }
+
+   return obj;
+}
+
+static void put_obj(struct

[PATCH v4 06/27] ntsync: Introduce NTSYNC_IOC_CREATE_EVENT.

2024-04-15 Thread Elizabeth Figura

This correspond to the NT syscall NtCreateEvent().

An NT event holds a single bit of state denoting whether it is signaled or
unsignaled.

There are two types of events: manual-reset and automatic-reset. When an
automatic-reset event is acquired via a wait function, its state is reset to
unsignaled. Manual-reset events are not affected by wait functions.

Whether the event is manual-reset, and its initial state, are specified at
creation time.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 61 +
 include/uapi/linux/ntsync.h |  7 +
 2 files changed, 68 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 1e68f96bc2a6..3e125c805c00 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -25,6 +25,7 @@
 enum ntsync_type {
NTSYNC_TYPE_SEM,
NTSYNC_TYPE_MUTEX,
+   NTSYNC_TYPE_EVENT,
 };
 
 /*
@@ -59,6 +60,10 @@ struct ntsync_obj {
__u32 owner;
bool ownerdead;
} mutex;
+   struct {
+   bool manual;
+   bool signaled;
+   } event;
} u;
 
/*
@@ -143,6 +148,8 @@ static bool is_signaled(struct ntsync_obj *obj, __u32 owner)
if (obj->u.mutex.owner && obj->u.mutex.owner != owner)
return false;
return obj->u.mutex.count < UINT_MAX;
+   case NTSYNC_TYPE_EVENT:
+   return obj->u.event.signaled;
}
 
WARN(1, "bad object type %#x\n", obj->type);
@@ -193,6 +200,10 @@ static void try_wake_all(struct ntsync_device *dev, struct 
ntsync_q *q,
obj->u.mutex.count++;
obj->u.mutex.owner = q->owner;
break;
+   case NTSYNC_TYPE_EVENT:
+   if (!obj->u.event.manual)
+   obj->u.event.signaled = false;
+   break;
}
}
wake_up_process(q->task);
@@ -261,6 +272,27 @@ static void try_wake_any_mutex(struct ntsync_obj *mutex)
}
 }
 
+static void try_wake_any_event(struct ntsync_obj *event)
+{
+   struct ntsync_q_entry *entry;
+
+   lockdep_assert_held(>lock);
+
+   list_for_each_entry(entry, >any_waiters, node) {
+   struct ntsync_q *q = entry->q;
+   int signaled = -1;
+
+   if (!event->u.event.signaled)
+   break;
+
+   if (atomic_try_cmpxchg(>signaled, , entry->index)) {
+   if (!event->u.event.manual)
+   event->u.event.signaled = false;
+   wake_up_process(q->task);
+   }
+   }
+}
+
 /*
  * Actually change the semaphore state, returning -EOVERFLOW if it is made
  * invalid.
@@ -569,6 +601,30 @@ static int ntsync_create_mutex(struct ntsync_device *dev, 
void __user *argp)
return put_user(fd, _args->mutex);
 }
 
+static int ntsync_create_event(struct ntsync_device *dev, void __user *argp)
+{
+   struct ntsync_event_args __user *user_args = argp;
+   struct ntsync_event_args args;
+   struct ntsync_obj *event;
+   int fd;
+
+   if (copy_from_user(, argp, sizeof(args)))
+   return -EFAULT;
+
+   event = ntsync_alloc_obj(dev, NTSYNC_TYPE_EVENT);
+   if (!event)
+   return -ENOMEM;
+   event->u.event.manual = args.manual;
+   event->u.event.signaled = args.signaled;
+   fd = ntsync_obj_get_fd(event);
+   if (fd < 0) {
+   kfree(event);
+   return fd;
+   }
+
+   return put_user(fd, _args->event);
+}
+
 static struct ntsync_obj *get_obj(struct ntsync_device *dev, int fd)
 {
struct file *file = fget(fd);
@@ -702,6 +758,9 @@ static void try_wake_any_obj(struct ntsync_obj *obj)
case NTSYNC_TYPE_MUTEX:
try_wake_any_mutex(obj);
break;
+   case NTSYNC_TYPE_EVENT:
+   try_wake_any_event(obj);
+   break;
}
 }
 
@@ -890,6 +949,8 @@ static long ntsync_char_ioctl(struct file *file, unsigned 
int cmd,
void __user *argp = (void __user *)parm;
 
switch (cmd) {
+   case NTSYNC_IOC_CREATE_EVENT:
+   return ntsync_create_event(dev, argp);
case NTSYNC_IOC_CREATE_MUTEX:
return ntsync_create_mutex(dev, argp);
case NTSYNC_IOC_CREATE_SEM:
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index 1bff8f19d6d9..0d133f2eaf0b 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -22,6 +22,12 @@ struct ntsync_mutex_args {
__u32 count;
 };
 
+struct ntsync_event_args {
+   __u32 event;
+   __u32 manual;
+   __u32 signaled;
+};
+
 #define NTSYNC_WAIT_REALTIME   0x1
 
 struct ntsync_wait_args

[PATCH v4 18/27] selftests: ntsync: Add some tests for wakeup signaling with WINESYNC_IOC_WAIT_ANY.

2024-04-15 Thread Elizabeth Figura

Test contended "wait-for-any" waits, to make sure that scheduling and wakeup
logic works correctly.

Signed-off-by: Elizabeth Figura 
---
 .../testing/selftests/drivers/ntsync/ntsync.c | 150 ++
 1 file changed, 150 insertions(+)

diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
index c0f372167557..993f5db23768 100644
--- a/tools/testing/selftests/drivers/ntsync/ntsync.c
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -556,4 +556,154 @@ TEST(test_wait_all)
close(fd);
 }
 
+struct wake_args {
+   int fd;
+   int obj;
+};
+
+struct wait_args {
+   int fd;
+   unsigned long request;
+   struct ntsync_wait_args *args;
+   int ret;
+   int err;
+};
+
+static void *wait_thread(void *arg)
+{
+   struct wait_args *args = arg;
+
+   args->ret = ioctl(args->fd, args->request, args->args);
+   args->err = errno;
+   return NULL;
+}
+
+static __u64 get_abs_timeout(unsigned int ms)
+{
+   struct timespec timeout;
+   clock_gettime(CLOCK_MONOTONIC, );
+   return (timeout.tv_sec * 10) + timeout.tv_nsec + (ms * 100);
+}
+
+static int wait_for_thread(pthread_t thread, unsigned int ms)
+{
+   struct timespec timeout;
+
+   clock_gettime(CLOCK_REALTIME, );
+   timeout.tv_nsec += ms * 100;
+   timeout.tv_sec += (timeout.tv_nsec / 10);
+   timeout.tv_nsec %= 10;
+   return pthread_timedjoin_np(thread, NULL, );
+}
+
+TEST(wake_any)
+{
+   struct ntsync_mutex_args mutex_args = {0};
+   struct ntsync_wait_args wait_args = {0};
+   struct ntsync_sem_args sem_args = {0};
+   struct wait_args thread_args;
+   int objs[2], fd, ret;
+   __u32 count, index;
+   pthread_t thread;
+
+   fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
+   ASSERT_LE(0, fd);
+
+   sem_args.count = 0;
+   sem_args.max = 3;
+   sem_args.sem = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_SEM, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, sem_args.sem);
+
+   mutex_args.owner = 123;
+   mutex_args.count = 1;
+   mutex_args.mutex = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_MUTEX, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, mutex_args.mutex);
+
+   objs[0] = sem_args.sem;
+   objs[1] = mutex_args.mutex;
+
+   /* test waking the semaphore */
+
+   wait_args.timeout = get_abs_timeout(1000);
+   wait_args.objs = (uintptr_t)objs;
+   wait_args.count = 2;
+   wait_args.owner = 456;
+   wait_args.index = 0xdeadbeef;
+   thread_args.fd = fd;
+   thread_args.args = _args;
+   thread_args.request = NTSYNC_IOC_WAIT_ANY;
+   ret = pthread_create(, NULL, wait_thread, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(ETIMEDOUT, ret);
+
+   count = 1;
+   ret = post_sem(sem_args.sem, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, count);
+   check_sem_state(sem_args.sem, 0, 3);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, thread_args.ret);
+   EXPECT_EQ(0, wait_args.index);
+
+   /* test waking the mutex */
+
+   /* first grab it again for owner 123 */
+   ret = wait_any(fd, 1, _args.mutex, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+
+   wait_args.timeout = get_abs_timeout(1000);
+   wait_args.owner = 456;
+   ret = pthread_create(, NULL, wait_thread, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(ETIMEDOUT, ret);
+
+   ret = unlock_mutex(mutex_args.mutex, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(2, count);
+
+   ret = pthread_tryjoin_np(thread, NULL);
+   EXPECT_EQ(EBUSY, ret);
+
+   ret = unlock_mutex(mutex_args.mutex, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, mutex_args.count);
+   check_mutex_state(mutex_args.mutex, 1, 456);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, thread_args.ret);
+   EXPECT_EQ(1, wait_args.index);
+
+   /* delete an object while it's being waited on */
+
+   wait_args.timeout = get_abs_timeout(200);
+   wait_args.owner = 123;
+   ret = pthread_create(, NULL, wait_thread, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(ETIMEDOUT, ret);
+
+   close(sem_args.sem);
+   close(mutex_args.mutex);
+
+   ret = wait_for_thread(thread, 200);
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(-1, thread_args.ret);
+   EXPECT_EQ(ETIMEDOUT, thread_args.err);
+
+   close(fd);
+}
+
 TEST_HARNESS_MAIN
-- 
2.43.0

[PATCH v4 21/27] selftests: ntsync: Add some tests for auto-reset event state.

2024-04-15 Thread Elizabeth Figura

Test event-specific ioctls NTSYNC_IOC_EVENT_SET, NTSYNC_IOC_EVENT_RESET,
NTSYNC_IOC_EVENT_PULSE, NTSYNC_IOC_EVENT_READ for auto-reset events, and
waiting on auto-reset events.

Signed-off-by: Elizabeth Figura 
---
 .../testing/selftests/drivers/ntsync/ntsync.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
index b6481c2b85cc..12ccb4ec28e4 100644
--- a/tools/testing/selftests/drivers/ntsync/ntsync.c
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -442,6 +442,65 @@ TEST(manual_event_state)
close(fd);
 }
 
+TEST(auto_event_state)
+{
+   struct ntsync_event_args event_args;
+   __u32 index, signaled;
+   int fd, event, ret;
+
+   fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
+   ASSERT_LE(0, fd);
+
+   event_args.manual = 0;
+   event_args.signaled = 1;
+   event_args.event = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_EVENT, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, event_args.event);
+   event = event_args.event;
+
+   check_event_state(event, 1, 0);
+
+   signaled = 0xdeadbeef;
+   ret = ioctl(event, NTSYNC_IOC_EVENT_SET, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, signaled);
+   check_event_state(event, 1, 0);
+
+   ret = wait_any(fd, 1, , 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_event_state(event, 0, 0);
+
+   signaled = 0xdeadbeef;
+   ret = ioctl(event, NTSYNC_IOC_EVENT_RESET, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, signaled);
+   check_event_state(event, 0, 0);
+
+   ret = wait_any(fd, 1, , 123, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(ETIMEDOUT, errno);
+
+   ret = ioctl(event, NTSYNC_IOC_EVENT_SET, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, signaled);
+
+   ret = ioctl(event, NTSYNC_IOC_EVENT_PULSE, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, signaled);
+   check_event_state(event, 0, 0);
+
+   ret = ioctl(event, NTSYNC_IOC_EVENT_PULSE, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, signaled);
+   check_event_state(event, 0, 0);
+
+   close(event);
+
+   close(fd);
+}
+
 TEST(test_wait_any)
 {
int objs[NTSYNC_MAX_WAIT_COUNT + 1], fd, ret;
-- 
2.43.0

[PATCH v4 10/27] ntsync: Introduce NTSYNC_IOC_SEM_READ.

2024-04-15 Thread Elizabeth Figura

This corresponds to the NT syscall NtQuerySemaphore().

This returns the current count and maximum count of the semaphore.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 21 +
 include/uapi/linux/ntsync.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index adba4657bf26..961e8d241602 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -532,6 +532,25 @@ static int ntsync_event_reset(struct ntsync_obj *event, 
void __user *argp)
return 0;
 }
 
+static int ntsync_sem_read(struct ntsync_obj *sem, void __user *argp)
+{
+   struct ntsync_sem_args __user *user_args = argp;
+   struct ntsync_sem_args args;
+
+   if (sem->type != NTSYNC_TYPE_SEM)
+   return -EINVAL;
+
+   args.sem = 0;
+   spin_lock(>lock);
+   args.count = sem->u.sem.count;
+   args.max = sem->u.sem.max;
+   spin_unlock(>lock);
+
+   if (copy_to_user(user_args, , sizeof(args)))
+   return -EFAULT;
+   return 0;
+}
+
 static int ntsync_obj_release(struct inode *inode, struct file *file)
 {
struct ntsync_obj *obj = file->private_data;
@@ -551,6 +570,8 @@ static long ntsync_obj_ioctl(struct file *file, unsigned 
int cmd,
switch (cmd) {
case NTSYNC_IOC_SEM_POST:
return ntsync_sem_post(obj, argp);
+   case NTSYNC_IOC_SEM_READ:
+   return ntsync_sem_read(obj, argp);
case NTSYNC_IOC_MUTEX_UNLOCK:
return ntsync_mutex_unlock(obj, argp);
case NTSYNC_IOC_MUTEX_KILL:
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index 57721f5d31ba..e298400bf25a 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -55,5 +55,6 @@ struct ntsync_wait_args {
 #define NTSYNC_IOC_EVENT_SET   _IOR ('N', 0x88, __u32)
 #define NTSYNC_IOC_EVENT_RESET _IOR ('N', 0x89, __u32)
 #define NTSYNC_IOC_EVENT_PULSE _IOR ('N', 0x8a, __u32)
+#define NTSYNC_IOC_SEM_READ_IOR ('N', 0x8b, struct ntsync_sem_args)
 
 #endif
-- 
2.43.0

[PATCH v4 16/27] selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ANY.

2024-04-15 Thread Elizabeth Figura

Test basic synchronous functionality of NTSYNC_IOC_WAIT_ANY, when objects are
considered signaled or not signaled, and how they are affected by a successful
wait.

Signed-off-by: Elizabeth Figura 
---
 .../testing/selftests/drivers/ntsync/ntsync.c | 119 ++
 1 file changed, 119 insertions(+)

diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
index 7cd0f40594fd..40ad8cbd3138 100644
--- a/tools/testing/selftests/drivers/ntsync/ntsync.c
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -342,4 +342,123 @@ TEST(mutex_state)
close(fd);
 }
 
+TEST(test_wait_any)
+{
+   int objs[NTSYNC_MAX_WAIT_COUNT + 1], fd, ret;
+   struct ntsync_mutex_args mutex_args = {0};
+   struct ntsync_sem_args sem_args = {0};
+   __u32 owner, index, count, i;
+   struct timespec timeout;
+
+   clock_gettime(CLOCK_MONOTONIC, );
+
+   fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
+   ASSERT_LE(0, fd);
+
+   sem_args.count = 2;
+   sem_args.max = 3;
+   sem_args.sem = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_SEM, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, sem_args.sem);
+
+   mutex_args.owner = 0;
+   mutex_args.count = 0;
+   mutex_args.mutex = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_MUTEX, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, mutex_args.mutex);
+
+   objs[0] = sem_args.sem;
+   objs[1] = mutex_args.mutex;
+
+   ret = wait_any(fd, 2, objs, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_sem_state(sem_args.sem, 1, 3);
+   check_mutex_state(mutex_args.mutex, 0, 0);
+
+   ret = wait_any(fd, 2, objs, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_sem_state(sem_args.sem, 0, 3);
+   check_mutex_state(mutex_args.mutex, 0, 0);
+
+   ret = wait_any(fd, 2, objs, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, index);
+   check_sem_state(sem_args.sem, 0, 3);
+   check_mutex_state(mutex_args.mutex, 1, 123);
+
+   count = 1;
+   ret = post_sem(sem_args.sem, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, count);
+
+   ret = wait_any(fd, 2, objs, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_sem_state(sem_args.sem, 0, 3);
+   check_mutex_state(mutex_args.mutex, 1, 123);
+
+   ret = wait_any(fd, 2, objs, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, index);
+   check_sem_state(sem_args.sem, 0, 3);
+   check_mutex_state(mutex_args.mutex, 2, 123);
+
+   ret = wait_any(fd, 2, objs, 456, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(ETIMEDOUT, errno);
+
+   owner = 123;
+   ret = ioctl(mutex_args.mutex, NTSYNC_IOC_MUTEX_KILL, );
+   EXPECT_EQ(0, ret);
+
+   ret = wait_any(fd, 2, objs, 456, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EOWNERDEAD, errno);
+   EXPECT_EQ(1, index);
+
+   ret = wait_any(fd, 2, objs, 456, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, index);
+
+   /* test waiting on the same object twice */
+   count = 2;
+   ret = post_sem(sem_args.sem, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, count);
+
+   objs[0] = objs[1] = sem_args.sem;
+   ret = wait_any(fd, 2, objs, 456, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_sem_state(sem_args.sem, 1, 3);
+
+   ret = wait_any(fd, 0, NULL, 456, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(ETIMEDOUT, errno);
+
+   for (i = 0; i < NTSYNC_MAX_WAIT_COUNT + 1; ++i)
+   objs[i] = sem_args.sem;
+
+   ret = wait_any(fd, NTSYNC_MAX_WAIT_COUNT, objs, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+
+   ret = wait_any(fd, NTSYNC_MAX_WAIT_COUNT + 1, objs, 123, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EINVAL, errno);
+
+   ret = wait_any(fd, -1, objs, 123, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EINVAL, errno);
+
+   close(sem_args.sem);
+   close(mutex_args.mutex);
+
+   close(fd);
+}
+
 TEST_HARNESS_MAIN
-- 
2.43.0

[PATCH v4 11/27] ntsync: Introduce NTSYNC_IOC_MUTEX_READ.

2024-04-15 Thread Elizabeth Figura

This corresponds to the NT syscall NtQueryMutant().

This returns the recursion count, owner, and abandoned state of the mutex.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 23 +++
 include/uapi/linux/ntsync.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 961e8d241602..bd043dccc9fa 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -551,6 +551,27 @@ static int ntsync_sem_read(struct ntsync_obj *sem, void 
__user *argp)
return 0;
 }
 
+static int ntsync_mutex_read(struct ntsync_obj *mutex, void __user *argp)
+{
+   struct ntsync_mutex_args __user *user_args = argp;
+   struct ntsync_mutex_args args;
+   int ret;
+
+   if (mutex->type != NTSYNC_TYPE_MUTEX)
+   return -EINVAL;
+
+   args.mutex = 0;
+   spin_lock(>lock);
+   args.count = mutex->u.mutex.count;
+   args.owner = mutex->u.mutex.owner;
+   ret = mutex->u.mutex.ownerdead ? -EOWNERDEAD : 0;
+   spin_unlock(>lock);
+
+   if (copy_to_user(user_args, , sizeof(args)))
+   return -EFAULT;
+   return ret;
+}
+
 static int ntsync_obj_release(struct inode *inode, struct file *file)
 {
struct ntsync_obj *obj = file->private_data;
@@ -576,6 +597,8 @@ static long ntsync_obj_ioctl(struct file *file, unsigned 
int cmd,
return ntsync_mutex_unlock(obj, argp);
case NTSYNC_IOC_MUTEX_KILL:
return ntsync_mutex_kill(obj, argp);
+   case NTSYNC_IOC_MUTEX_READ:
+   return ntsync_mutex_read(obj, argp);
case NTSYNC_IOC_EVENT_SET:
return ntsync_event_set(obj, argp, false);
case NTSYNC_IOC_EVENT_RESET:
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index e298400bf25a..797e8df10a3a 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -56,5 +56,6 @@ struct ntsync_wait_args {
 #define NTSYNC_IOC_EVENT_RESET _IOR ('N', 0x89, __u32)
 #define NTSYNC_IOC_EVENT_PULSE _IOR ('N', 0x8a, __u32)
 #define NTSYNC_IOC_SEM_READ_IOR ('N', 0x8b, struct ntsync_sem_args)
+#define NTSYNC_IOC_MUTEX_READ  _IOR ('N', 0x8c, struct 
ntsync_mutex_args)
 
 #endif
-- 
2.43.0

[PATCH v4 14/27] selftests: ntsync: Add some tests for semaphore state.

2024-04-15 Thread Elizabeth Figura

Wine has tests for its synchronization primitives, but these are more accessible
to kernel developers, and also allow us to test some edge cases that Wine does
not care about.

This patch adds tests for semaphore-specific ioctls NTSYNC_IOC_SEM_POST and
NTSYNC_IOC_SEM_READ, and waiting on semaphores.

Signed-off-by: Elizabeth Figura 
---
 tools/testing/selftests/Makefile  |   1 +
 .../selftests/drivers/ntsync/.gitignore   |   1 +
 .../testing/selftests/drivers/ntsync/Makefile |   7 +
 tools/testing/selftests/drivers/ntsync/config |   1 +
 .../testing/selftests/drivers/ntsync/ntsync.c | 149 ++
 5 files changed, 159 insertions(+)
 create mode 100644 tools/testing/selftests/drivers/ntsync/.gitignore
 create mode 100644 tools/testing/selftests/drivers/ntsync/Makefile
 create mode 100644 tools/testing/selftests/drivers/ntsync/config
 create mode 100644 tools/testing/selftests/drivers/ntsync/ntsync.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index e1504833654d..6f95206325e1 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -16,6 +16,7 @@ TARGETS += damon
 TARGETS += devices
 TARGETS += dmabuf-heaps
 TARGETS += drivers/dma-buf
+TARGETS += drivers/ntsync
 TARGETS += drivers/s390x/uvdevice
 TARGETS += drivers/net/bonding
 TARGETS += drivers/net/team
diff --git a/tools/testing/selftests/drivers/ntsync/.gitignore 
b/tools/testing/selftests/drivers/ntsync/.gitignore
new file mode 100644
index ..848573a3d3ea
--- /dev/null
+++ b/tools/testing/selftests/drivers/ntsync/.gitignore
@@ -0,0 +1 @@
+ntsync
diff --git a/tools/testing/selftests/drivers/ntsync/Makefile 
b/tools/testing/selftests/drivers/ntsync/Makefile
new file mode 100644
index ..dbf2b055c0b2
--- /dev/null
+++ b/tools/testing/selftests/drivers/ntsync/Makefile
@@ -0,0 +1,7 @@
+# SPDX-LICENSE-IDENTIFIER: GPL-2.0-only
+TEST_GEN_PROGS := ntsync
+
+CFLAGS += $(KHDR_INCLUDES)
+LDLIBS += -lpthread
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/drivers/ntsync/config 
b/tools/testing/selftests/drivers/ntsync/config
new file mode 100644
index ..60539c826d06
--- /dev/null
+++ b/tools/testing/selftests/drivers/ntsync/config
@@ -0,0 +1 @@
+CONFIG_WINESYNC=y
diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
new file mode 100644
index ..1e145c6dfded
--- /dev/null
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Various unit tests for the "ntsync" synchronization primitive driver.
+ *
+ * Copyright (C) 2021-2022 Elizabeth Figura 
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "../../kselftest_harness.h"
+
+static int read_sem_state(int sem, __u32 *count, __u32 *max)
+{
+   struct ntsync_sem_args args;
+   int ret;
+
+   memset(, 0xcc, sizeof(args));
+   ret = ioctl(sem, NTSYNC_IOC_SEM_READ, );
+   *count = args.count;
+   *max = args.max;
+   return ret;
+}
+
+#define check_sem_state(sem, count, max) \
+   ({ \
+   __u32 __count, __max; \
+   int ret = read_sem_state((sem), &__count, &__max); \
+   EXPECT_EQ(0, ret); \
+   EXPECT_EQ((count), __count); \
+   EXPECT_EQ((max), __max); \
+   })
+
+static int post_sem(int sem, __u32 *count)
+{
+   return ioctl(sem, NTSYNC_IOC_SEM_POST, count);
+}
+
+static int wait_any(int fd, __u32 count, const int *objs, __u32 owner, __u32 
*index)
+{
+   struct ntsync_wait_args args = {0};
+   struct timespec timeout;
+   int ret;
+
+   clock_gettime(CLOCK_MONOTONIC, );
+
+   args.timeout = timeout.tv_sec * 10 + timeout.tv_nsec;
+   args.count = count;
+   args.objs = (uintptr_t)objs;
+   args.owner = owner;
+   args.index = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_WAIT_ANY, );
+   *index = args.index;
+   return ret;
+}
+
+TEST(semaphore_state)
+{
+   struct ntsync_sem_args sem_args;
+   struct timespec timeout;
+   __u32 count, index;
+   int fd, ret, sem;
+
+   clock_gettime(CLOCK_MONOTONIC, );
+
+   fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
+   ASSERT_LE(0, fd);
+
+   sem_args.count = 3;
+   sem_args.max = 2;
+   sem_args.sem = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_SEM, _args);
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EINVAL, errno);
+
+   sem_args.count = 2;
+   sem_args.max = 2;
+   sem_args.sem = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_SEM, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, sem_args.sem);
+   sem = sem_args.sem;
+   check_sem_state(sem, 2, 2);
+
+   count = 0;
+   ret = post_sem(sem, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(2, count);
+   check_sem_state(sem, 2, 2);
+
+

[PATCH v4 08/27] ntsync: Introduce NTSYNC_IOC_EVENT_RESET.

2024-04-15 Thread Elizabeth Figura

This corresponds to the NT syscall NtResetEvent().

This sets the event to the unsignaled state, and returns its previous state.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 22 ++
 include/uapi/linux/ntsync.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 69f359241cf6..ae78425c87d1 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -508,6 +508,26 @@ static int ntsync_event_set(struct ntsync_obj *event, void 
__user *argp)
return 0;
 }
 
+static int ntsync_event_reset(struct ntsync_obj *event, void __user *argp)
+{
+   __u32 prev_state;
+
+   if (event->type != NTSYNC_TYPE_EVENT)
+   return -EINVAL;
+
+   spin_lock(>lock);
+
+   prev_state = event->u.event.signaled;
+   event->u.event.signaled = false;
+
+   spin_unlock(>lock);
+
+   if (put_user(prev_state, (__u32 __user *)argp))
+   return -EFAULT;
+
+   return 0;
+}
+
 static int ntsync_obj_release(struct inode *inode, struct file *file)
 {
struct ntsync_obj *obj = file->private_data;
@@ -533,6 +553,8 @@ static long ntsync_obj_ioctl(struct file *file, unsigned 
int cmd,
return ntsync_mutex_kill(obj, argp);
case NTSYNC_IOC_EVENT_SET:
return ntsync_event_set(obj, argp);
+   case NTSYNC_IOC_EVENT_RESET:
+   return ntsync_event_reset(obj, argp);
default:
return -ENOIOCTLCMD;
}
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index 65329d15a472..657542107328 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -53,5 +53,6 @@ struct ntsync_wait_args {
 #define NTSYNC_IOC_MUTEX_UNLOCK_IOWR('N', 0x85, struct 
ntsync_mutex_args)
 #define NTSYNC_IOC_MUTEX_KILL  _IOW ('N', 0x86, __u32)
 #define NTSYNC_IOC_EVENT_SET   _IOR ('N', 0x88, __u32)
+#define NTSYNC_IOC_EVENT_RESET _IOR ('N', 0x89, __u32)
 
 #endif
-- 
2.43.0

[PATCH v4 17/27] selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ALL.

2024-04-15 Thread Elizabeth Figura

Test basic synchronous functionality of NTSYNC_IOC_WAIT_ALL, and when objects
are considered simultaneously signaled.

Signed-off-by: Elizabeth Figura 
---
 .../testing/selftests/drivers/ntsync/ntsync.c | 99 ++-
 1 file changed, 97 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
index 40ad8cbd3138..c0f372167557 100644
--- a/tools/testing/selftests/drivers/ntsync/ntsync.c
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -73,7 +73,8 @@ static int unlock_mutex(int mutex, __u32 owner, __u32 *count)
return ret;
 }
 
-static int wait_any(int fd, __u32 count, const int *objs, __u32 owner, __u32 
*index)
+static int wait_objs(int fd, unsigned long request, __u32 count,
+const int *objs, __u32 owner, __u32 *index)
 {
struct ntsync_wait_args args = {0};
struct timespec timeout;
@@ -86,11 +87,21 @@ static int wait_any(int fd, __u32 count, const int *objs, 
__u32 owner, __u32 *in
args.objs = (uintptr_t)objs;
args.owner = owner;
args.index = 0xdeadbeef;
-   ret = ioctl(fd, NTSYNC_IOC_WAIT_ANY, );
+   ret = ioctl(fd, request, );
*index = args.index;
return ret;
 }
 
+static int wait_any(int fd, __u32 count, const int *objs, __u32 owner, __u32 
*index)
+{
+   return wait_objs(fd, NTSYNC_IOC_WAIT_ANY, count, objs, owner, index);
+}
+
+static int wait_all(int fd, __u32 count, const int *objs, __u32 owner, __u32 
*index)
+{
+   return wait_objs(fd, NTSYNC_IOC_WAIT_ALL, count, objs, owner, index);
+}
+
 TEST(semaphore_state)
 {
struct ntsync_sem_args sem_args;
@@ -461,4 +472,88 @@ TEST(test_wait_any)
close(fd);
 }
 
+TEST(test_wait_all)
+{
+   struct ntsync_mutex_args mutex_args = {0};
+   struct ntsync_sem_args sem_args = {0};
+   __u32 owner, index, count;
+   int objs[2], fd, ret;
+
+   fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
+   ASSERT_LE(0, fd);
+
+   sem_args.count = 2;
+   sem_args.max = 3;
+   sem_args.sem = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_SEM, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, sem_args.sem);
+
+   mutex_args.owner = 0;
+   mutex_args.count = 0;
+   mutex_args.mutex = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_MUTEX, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, mutex_args.mutex);
+
+   objs[0] = sem_args.sem;
+   objs[1] = mutex_args.mutex;
+
+   ret = wait_all(fd, 2, objs, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_sem_state(sem_args.sem, 1, 3);
+   check_mutex_state(mutex_args.mutex, 1, 123);
+
+   ret = wait_all(fd, 2, objs, 456, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(ETIMEDOUT, errno);
+   check_sem_state(sem_args.sem, 1, 3);
+   check_mutex_state(mutex_args.mutex, 1, 123);
+
+   ret = wait_all(fd, 2, objs, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_sem_state(sem_args.sem, 0, 3);
+   check_mutex_state(mutex_args.mutex, 2, 123);
+
+   ret = wait_all(fd, 2, objs, 123, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(ETIMEDOUT, errno);
+   check_sem_state(sem_args.sem, 0, 3);
+   check_mutex_state(mutex_args.mutex, 2, 123);
+
+   count = 3;
+   ret = post_sem(sem_args.sem, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, count);
+
+   ret = wait_all(fd, 2, objs, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_sem_state(sem_args.sem, 2, 3);
+   check_mutex_state(mutex_args.mutex, 3, 123);
+
+   owner = 123;
+   ret = ioctl(mutex_args.mutex, NTSYNC_IOC_MUTEX_KILL, );
+   EXPECT_EQ(0, ret);
+
+   ret = wait_all(fd, 2, objs, 123, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EOWNERDEAD, errno);
+   check_sem_state(sem_args.sem, 1, 3);
+   check_mutex_state(mutex_args.mutex, 1, 123);
+
+   /* test waiting on the same object twice */
+   objs[0] = objs[1] = sem_args.sem;
+   ret = wait_all(fd, 2, objs, 123, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EINVAL, errno);
+
+   close(sem_args.sem);
+   close(mutex_args.mutex);
+
+   close(fd);
+}
+
 TEST_HARNESS_MAIN
-- 
2.43.0

[PATCH v4 03/27] ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.

2024-04-15 Thread Elizabeth Figura

This corresponds to the NT syscall NtCreateMutant().

An NT mutex is recursive, with a 32-bit recursion counter. When acquired via
NtWaitForMultipleObjects(), the recursion counter is incremented by one.

The OS records the thread which acquired it. However, in order to keep this
driver self-contained, the owning thread ID is managed by user-space, and passed
as a parameter to all relevant ioctls.

The initial owner and recursion count, if any, are specified when the mutex is
created.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 68 +
 include/uapi/linux/ntsync.h |  7 
 2 files changed, 75 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index e914d626465a..173513aeeacc 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -24,6 +24,7 @@
 
 enum ntsync_type {
NTSYNC_TYPE_SEM,
+   NTSYNC_TYPE_MUTEX,
 };
 
 /*
@@ -53,6 +54,10 @@ struct ntsync_obj {
__u32 count;
__u32 max;
} sem;
+   struct {
+   __u32 count;
+   __u32 owner;
+   } mutex;
} u;
 
/*
@@ -132,6 +137,10 @@ static bool is_signaled(struct ntsync_obj *obj, __u32 
owner)
switch (obj->type) {
case NTSYNC_TYPE_SEM:
return !!obj->u.sem.count;
+   case NTSYNC_TYPE_MUTEX:
+   if (obj->u.mutex.owner && obj->u.mutex.owner != owner)
+   return false;
+   return obj->u.mutex.count < UINT_MAX;
}
 
WARN(1, "bad object type %#x\n", obj->type);
@@ -175,6 +184,10 @@ static void try_wake_all(struct ntsync_device *dev, struct 
ntsync_q *q,
case NTSYNC_TYPE_SEM:
obj->u.sem.count--;
break;
+   case NTSYNC_TYPE_MUTEX:
+   obj->u.mutex.count++;
+   obj->u.mutex.owner = q->owner;
+   break;
}
}
wake_up_process(q->task);
@@ -217,6 +230,29 @@ static void try_wake_any_sem(struct ntsync_obj *sem)
}
 }
 
+static void try_wake_any_mutex(struct ntsync_obj *mutex)
+{
+   struct ntsync_q_entry *entry;
+
+   lockdep_assert_held(>lock);
+
+   list_for_each_entry(entry, >any_waiters, node) {
+   struct ntsync_q *q = entry->q;
+   int signaled = -1;
+
+   if (mutex->u.mutex.count == UINT_MAX)
+   break;
+   if (mutex->u.mutex.owner && mutex->u.mutex.owner != q->owner)
+   continue;
+
+   if (atomic_try_cmpxchg(>signaled, , entry->index)) {
+   mutex->u.mutex.count++;
+   mutex->u.mutex.owner = q->owner;
+   wake_up_process(q->task);
+   }
+   }
+}
+
 /*
  * Actually change the semaphore state, returning -EOVERFLOW if it is made
  * invalid.
@@ -376,6 +412,33 @@ static int ntsync_create_sem(struct ntsync_device *dev, 
void __user *argp)
return put_user(fd, _args->sem);
 }
 
+static int ntsync_create_mutex(struct ntsync_device *dev, void __user *argp)
+{
+   struct ntsync_mutex_args __user *user_args = argp;
+   struct ntsync_mutex_args args;
+   struct ntsync_obj *mutex;
+   int fd;
+
+   if (copy_from_user(, argp, sizeof(args)))
+   return -EFAULT;
+
+   if (!args.owner != !args.count)
+   return -EINVAL;
+
+   mutex = ntsync_alloc_obj(dev, NTSYNC_TYPE_MUTEX);
+   if (!mutex)
+   return -ENOMEM;
+   mutex->u.mutex.count = args.count;
+   mutex->u.mutex.owner = args.owner;
+   fd = ntsync_obj_get_fd(mutex);
+   if (fd < 0) {
+   kfree(mutex);
+   return fd;
+   }
+
+   return put_user(fd, _args->mutex);
+}
+
 static struct ntsync_obj *get_obj(struct ntsync_device *dev, int fd)
 {
struct file *file = fget(fd);
@@ -505,6 +568,9 @@ static void try_wake_any_obj(struct ntsync_obj *obj)
case NTSYNC_TYPE_SEM:
try_wake_any_sem(obj);
break;
+   case NTSYNC_TYPE_MUTEX:
+   try_wake_any_mutex(obj);
+   break;
}
 }
 
@@ -693,6 +759,8 @@ static long ntsync_char_ioctl(struct file *file, unsigned 
int cmd,
void __user *argp = (void __user *)parm;
 
switch (cmd) {
+   case NTSYNC_IOC_CREATE_MUTEX:
+   return ntsync_create_mutex(dev, argp);
case NTSYNC_IOC_CREATE_SEM:
return ntsync_create_sem(dev, argp);
case NTSYNC_IOC_WAIT_ALL:
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index 83784d4438a1..cd7841cdba49 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -16,6 +16,12 @@ struct ntsync_sem_args {

[PATCH v4 19/27] selftests: ntsync: Add some tests for wakeup signaling with WINESYNC_IOC_WAIT_ALL.

2024-04-15 Thread Elizabeth Figura

Test contended "wait-for-all" waits, to make sure that scheduling and wakeup
logic works correctly, and that the wait only exits once objects are all
simultaneously signaled.

Signed-off-by: Elizabeth Figura 
---
 .../testing/selftests/drivers/ntsync/ntsync.c | 98 +++
 1 file changed, 98 insertions(+)

diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
index 993f5db23768..b77fb0b2c4b1 100644
--- a/tools/testing/selftests/drivers/ntsync/ntsync.c
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -706,4 +706,102 @@ TEST(wake_any)
close(fd);
 }
 
+TEST(wake_all)
+{
+   struct ntsync_mutex_args mutex_args = {0};
+   struct ntsync_wait_args wait_args = {0};
+   struct ntsync_sem_args sem_args = {0};
+   struct wait_args thread_args;
+   int objs[2], fd, ret;
+   __u32 count, index;
+   pthread_t thread;
+
+   fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
+   ASSERT_LE(0, fd);
+
+   sem_args.count = 0;
+   sem_args.max = 3;
+   sem_args.sem = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_SEM, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, sem_args.sem);
+
+   mutex_args.owner = 123;
+   mutex_args.count = 1;
+   mutex_args.mutex = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_MUTEX, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, mutex_args.mutex);
+
+   objs[0] = sem_args.sem;
+   objs[1] = mutex_args.mutex;
+
+   wait_args.timeout = get_abs_timeout(1000);
+   wait_args.objs = (uintptr_t)objs;
+   wait_args.count = 2;
+   wait_args.owner = 456;
+   thread_args.fd = fd;
+   thread_args.args = _args;
+   thread_args.request = NTSYNC_IOC_WAIT_ALL;
+   ret = pthread_create(, NULL, wait_thread, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(ETIMEDOUT, ret);
+
+   count = 1;
+   ret = post_sem(sem_args.sem, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, count);
+
+   ret = pthread_tryjoin_np(thread, NULL);
+   EXPECT_EQ(EBUSY, ret);
+
+   check_sem_state(sem_args.sem, 1, 3);
+
+   ret = wait_any(fd, 1, _args.sem, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+
+   ret = unlock_mutex(mutex_args.mutex, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, count);
+
+   ret = pthread_tryjoin_np(thread, NULL);
+   EXPECT_EQ(EBUSY, ret);
+
+   check_mutex_state(mutex_args.mutex, 0, 0);
+
+   count = 2;
+   ret = post_sem(sem_args.sem, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, count);
+   check_sem_state(sem_args.sem, 1, 3);
+   check_mutex_state(mutex_args.mutex, 1, 456);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, thread_args.ret);
+
+   /* delete an object while it's being waited on */
+
+   wait_args.timeout = get_abs_timeout(200);
+   wait_args.owner = 123;
+   ret = pthread_create(, NULL, wait_thread, _args);
+   EXPECT_EQ(0, ret);
+
+   ret = wait_for_thread(thread, 100);
+   EXPECT_EQ(ETIMEDOUT, ret);
+
+   close(sem_args.sem);
+   close(mutex_args.mutex);
+
+   ret = wait_for_thread(thread, 200);
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(-1, thread_args.ret);
+   EXPECT_EQ(ETIMEDOUT, thread_args.err);
+
+   close(fd);
+}
+
 TEST_HARNESS_MAIN
-- 
2.43.0

[PATCH v4 15/27] selftests: ntsync: Add some tests for mutex state.

2024-04-15 Thread Elizabeth Figura

Test mutex-specific ioctls NTSYNC_IOC_MUTEX_UNLOCK and NTSYNC_IOC_MUTEX_READ,
and waiting on mutexes.

Signed-off-by: Elizabeth Figura 
---
 .../testing/selftests/drivers/ntsync/ntsync.c | 196 ++
 1 file changed, 196 insertions(+)

diff --git a/tools/testing/selftests/drivers/ntsync/ntsync.c 
b/tools/testing/selftests/drivers/ntsync/ntsync.c
index 1e145c6dfded..7cd0f40594fd 100644
--- a/tools/testing/selftests/drivers/ntsync/ntsync.c
+++ b/tools/testing/selftests/drivers/ntsync/ntsync.c
@@ -40,6 +40,39 @@ static int post_sem(int sem, __u32 *count)
return ioctl(sem, NTSYNC_IOC_SEM_POST, count);
 }
 
+static int read_mutex_state(int mutex, __u32 *count, __u32 *owner)
+{
+   struct ntsync_mutex_args args;
+   int ret;
+
+   memset(, 0xcc, sizeof(args));
+   ret = ioctl(mutex, NTSYNC_IOC_MUTEX_READ, );
+   *count = args.count;
+   *owner = args.owner;
+   return ret;
+}
+
+#define check_mutex_state(mutex, count, owner) \
+   ({ \
+   __u32 __count, __owner; \
+   int ret = read_mutex_state((mutex), &__count, &__owner); \
+   EXPECT_EQ(0, ret); \
+   EXPECT_EQ((count), __count); \
+   EXPECT_EQ((owner), __owner); \
+   })
+
+static int unlock_mutex(int mutex, __u32 owner, __u32 *count)
+{
+   struct ntsync_mutex_args args;
+   int ret;
+
+   args.owner = owner;
+   args.count = 0xdeadbeef;
+   ret = ioctl(mutex, NTSYNC_IOC_MUTEX_UNLOCK, );
+   *count = args.count;
+   return ret;
+}
+
 static int wait_any(int fd, __u32 count, const int *objs, __u32 owner, __u32 
*index)
 {
struct ntsync_wait_args args = {0};
@@ -146,4 +179,167 @@ TEST(semaphore_state)
close(fd);
 }
 
+TEST(mutex_state)
+{
+   struct ntsync_mutex_args mutex_args;
+   __u32 owner, count, index;
+   struct timespec timeout;
+   int fd, ret, mutex;
+
+   clock_gettime(CLOCK_MONOTONIC, );
+
+   fd = open("/dev/ntsync", O_CLOEXEC | O_RDONLY);
+   ASSERT_LE(0, fd);
+
+   mutex_args.owner = 123;
+   mutex_args.count = 0;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_MUTEX, _args);
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EINVAL, errno);
+
+   mutex_args.owner = 0;
+   mutex_args.count = 2;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_MUTEX, _args);
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EINVAL, errno);
+
+   mutex_args.owner = 123;
+   mutex_args.count = 2;
+   mutex_args.mutex = 0xdeadbeef;
+   ret = ioctl(fd, NTSYNC_IOC_CREATE_MUTEX, _args);
+   EXPECT_EQ(0, ret);
+   EXPECT_NE(0xdeadbeef, mutex_args.mutex);
+   mutex = mutex_args.mutex;
+   check_mutex_state(mutex, 2, 123);
+
+   ret = unlock_mutex(mutex, 0, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EINVAL, errno);
+
+   ret = unlock_mutex(mutex, 456, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EPERM, errno);
+   check_mutex_state(mutex, 2, 123);
+
+   ret = unlock_mutex(mutex, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(2, count);
+   check_mutex_state(mutex, 1, 123);
+
+   ret = unlock_mutex(mutex, 123, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(1, count);
+   check_mutex_state(mutex, 0, 0);
+
+   ret = unlock_mutex(mutex, 123, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EPERM, errno);
+
+   ret = wait_any(fd, 1, , 456, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_mutex_state(mutex, 1, 456);
+
+   ret = wait_any(fd, 1, , 456, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(0, index);
+   check_mutex_state(mutex, 2, 456);
+
+   ret = unlock_mutex(mutex, 456, );
+   EXPECT_EQ(0, ret);
+   EXPECT_EQ(2, count);
+   check_mutex_state(mutex, 1, 456);
+
+   ret = wait_any(fd, 1, , 123, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(ETIMEDOUT, errno);
+
+   owner = 0;
+   ret = ioctl(mutex, NTSYNC_IOC_MUTEX_KILL, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EINVAL, errno);
+
+   owner = 123;
+   ret = ioctl(mutex, NTSYNC_IOC_MUTEX_KILL, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EPERM, errno);
+   check_mutex_state(mutex, 1, 456);
+
+   owner = 456;
+   ret = ioctl(mutex, NTSYNC_IOC_MUTEX_KILL, );
+   EXPECT_EQ(0, ret);
+
+   memset(_args, 0xcc, sizeof(mutex_args));
+   ret = ioctl(mutex, NTSYNC_IOC_MUTEX_READ, _args);
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EOWNERDEAD, errno);
+   EXPECT_EQ(0, mutex_args.count);
+   EXPECT_EQ(0, mutex_args.owner);
+
+   memset(_args, 0xcc, sizeof(mutex_args));
+   ret = ioctl(mutex, NTSYNC_IOC_MUTEX_READ, _args);
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EOWNERDEAD, errno);
+   EXPECT_EQ(0, mutex_args.count);
+   EXPECT_EQ(0, mutex_args.owner);
+
+   ret = wait_any(fd, 1, , 123, );
+   EXPECT_EQ(-1, ret);
+   EXPECT_EQ(EOWNERDEAD, errno);
+   EXPECT_EQ(0, index);
+   check_mutex_state(mutex, 1, 123);
+
+

[PATCH v4 12/27] ntsync: Introduce NTSYNC_IOC_EVENT_READ.

2024-04-15 Thread Elizabeth Figura

This corresponds to the NT syscall NtQueryEvent().

This returns the signaled state of the event and whether it is manual-reset.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 21 +
 include/uapi/linux/ntsync.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index bd043dccc9fa..a03c6fceb518 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -572,6 +572,25 @@ static int ntsync_mutex_read(struct ntsync_obj *mutex, 
void __user *argp)
return ret;
 }
 
+static int ntsync_event_read(struct ntsync_obj *event, void __user *argp)
+{
+   struct ntsync_event_args __user *user_args = argp;
+   struct ntsync_event_args args;
+
+   if (event->type != NTSYNC_TYPE_EVENT)
+   return -EINVAL;
+
+   args.event = 0;
+   spin_lock(>lock);
+   args.manual = event->u.event.manual;
+   args.signaled = event->u.event.signaled;
+   spin_unlock(>lock);
+
+   if (copy_to_user(user_args, , sizeof(args)))
+   return -EFAULT;
+   return 0;
+}
+
 static int ntsync_obj_release(struct inode *inode, struct file *file)
 {
struct ntsync_obj *obj = file->private_data;
@@ -605,6 +624,8 @@ static long ntsync_obj_ioctl(struct file *file, unsigned 
int cmd,
return ntsync_event_reset(obj, argp);
case NTSYNC_IOC_EVENT_PULSE:
return ntsync_event_set(obj, argp, true);
+   case NTSYNC_IOC_EVENT_READ:
+   return ntsync_event_read(obj, argp);
default:
return -ENOIOCTLCMD;
}
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index 797e8df10a3a..80f36de46a75 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -57,5 +57,6 @@ struct ntsync_wait_args {
 #define NTSYNC_IOC_EVENT_PULSE _IOR ('N', 0x8a, __u32)
 #define NTSYNC_IOC_SEM_READ_IOR ('N', 0x8b, struct ntsync_sem_args)
 #define NTSYNC_IOC_MUTEX_READ  _IOR ('N', 0x8c, struct 
ntsync_mutex_args)
+#define NTSYNC_IOC_EVENT_READ  _IOR ('N', 0x8d, struct 
ntsync_event_args)
 
 #endif
-- 
2.43.0

[PATCH v4 04/27] ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK.

2024-04-15 Thread Elizabeth Figura

This corresponds to the NT syscall NtReleaseMutant().

This syscall decrements the mutex's recursion count by one, and returns the
previous value. If the mutex is not owned by the given owner ID, the function
instead fails and returns -EPERM.

Signed-off-by: Elizabeth Figura 
---
 drivers/misc/ntsync.c   | 64 +
 include/uapi/linux/ntsync.h |  1 +
 2 files changed, 65 insertions(+)

diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c
index 173513aeeacc..f7911ef78d5b 100644
--- a/drivers/misc/ntsync.c
+++ b/drivers/misc/ntsync.c
@@ -315,6 +315,68 @@ static int ntsync_sem_post(struct ntsync_obj *sem, void 
__user *argp)
return ret;
 }
 
+/*
+ * Actually change the mutex state, returning -EPERM if not the owner.
+ */
+static int unlock_mutex_state(struct ntsync_obj *mutex,
+ const struct ntsync_mutex_args *args)
+{
+   lockdep_assert_held(>lock);
+
+   if (mutex->u.mutex.owner != args->owner)
+   return -EPERM;
+
+   if (!--mutex->u.mutex.count)
+   mutex->u.mutex.owner = 0;
+   return 0;
+}
+
+static int ntsync_mutex_unlock(struct ntsync_obj *mutex, void __user *argp)
+{
+   struct ntsync_mutex_args __user *user_args = argp;
+   struct ntsync_device *dev = mutex->dev;
+   struct ntsync_mutex_args args;
+   __u32 prev_count;
+   int ret;
+
+   if (copy_from_user(, argp, sizeof(args)))
+   return -EFAULT;
+   if (!args.owner)
+   return -EINVAL;
+
+   if (mutex->type != NTSYNC_TYPE_MUTEX)
+   return -EINVAL;
+
+   if (atomic_read(>all_hint) > 0) {
+   spin_lock(>wait_all_lock);
+   spin_lock_nest_lock(>lock, >wait_all_lock);
+
+   prev_count = mutex->u.mutex.count;
+   ret = unlock_mutex_state(mutex, );
+   if (!ret) {
+   try_wake_all_obj(dev, mutex);
+   try_wake_any_mutex(mutex);
+   }
+
+   spin_unlock(>lock);
+   spin_unlock(>wait_all_lock);
+   } else {
+   spin_lock(>lock);
+
+   prev_count = mutex->u.mutex.count;
+   ret = unlock_mutex_state(mutex, );
+   if (!ret)
+   try_wake_any_mutex(mutex);
+
+   spin_unlock(>lock);
+   }
+
+   if (!ret && put_user(prev_count, _args->count))
+   ret = -EFAULT;
+
+   return ret;
+}
+
 static int ntsync_obj_release(struct inode *inode, struct file *file)
 {
struct ntsync_obj *obj = file->private_data;
@@ -334,6 +396,8 @@ static long ntsync_obj_ioctl(struct file *file, unsigned 
int cmd,
switch (cmd) {
case NTSYNC_IOC_SEM_POST:
return ntsync_sem_post(obj, argp);
+   case NTSYNC_IOC_MUTEX_UNLOCK:
+   return ntsync_mutex_unlock(obj, argp);
default:
return -ENOIOCTLCMD;
}
diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h
index cd7841cdba49..fa2c9f638d77 100644
--- a/include/uapi/linux/ntsync.h
+++ b/include/uapi/linux/ntsync.h
@@ -43,5 +43,6 @@ struct ntsync_wait_args {
 #define NTSYNC_IOC_CREATE_MUTEX_IOWR('N', 0x84, struct 
ntsync_sem_args)
 
 #define NTSYNC_IOC_SEM_POST_IOWR('N', 0x81, __u32)
+#define NTSYNC_IOC_MUTEX_UNLOCK_IOWR('N', 0x85, struct 
ntsync_mutex_args)
 
 #endif
-- 
2.43.0

[PATCH net-next v2 6/6] selftests: drv-net: add a trivial ping test

2024-04-15 Thread Jakub Kicinski

Add a very simple test for testing with a remote system.
Both IPv4 and IPv6 connectivity is optional so tests
will XFail is env doesn't define an address for the given
family.

Using netdevsim:

 $ ./run_kselftest.sh -t drivers/net:ping.py
 TAP version 13
 1..1
 # timeout set to 45
 # selftests: drivers/net: ping.py
 # KTAP version 1
 # 1..2
 # ok 1 ping.ping_v4
 # ok 2 ping.ping_v6
 # # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
 ok 1 selftests: drivers/net: ping.py

Command line SSH:

 $ NETIF=virbr0 REMOTE_TYPE=ssh REMOTE_ARGS=root@192.168.122.123 \
LOCAL_V4=192.168.122.1 REMOTE_V4=192.168.122.123 \
./tools/testing/selftests/drivers/net/ping.py
 KTAP version 1
 1..2
 ok 1 ping.ping_v4
 ok 2 ping.ping_v6 # XFAIL
 # Totals: pass:1 fail:0 xfail:1 xpass:0 skip:0 error:0

Existing devices placed in netns (and using net.config):

 $ cat drivers/net/net.config
 NETIF=veth0
 REMOTE_TYPE=netns
 REMOTE_ARGS=red
 LOCAL_V4="192.168.1.1"
 REMOTE_V4="192.168.1.2"

 $ ./run_kselftest.sh -t drivers/net:ping.py
 TAP version 13
 1..1
 # timeout set to 45
 # selftests: drivers/net: ping.py
 # KTAP version 1
 # 1..2
 # ok 1 ping.ping_v4
 # ok 2 ping.ping_v6 # XFAIL
 # # Totals: pass:1 fail:0 xfail:1 xpass:0 skip:0 error:0

Signed-off-by: Jakub Kicinski 
---
 tools/testing/selftests/drivers/net/Makefile |  5 ++-
 tools/testing/selftests/drivers/net/ping.py  | 32 
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/drivers/net/ping.py

diff --git a/tools/testing/selftests/drivers/net/Makefile 
b/tools/testing/selftests/drivers/net/Makefile
index 379cdb1960a7..754ec643768a 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -2,6 +2,9 @@
 
 TEST_INCLUDES := $(wildcard lib/py/*.py)
 
-TEST_PROGS := stats.py
+TEST_PROGS := \
+   ping.py \
+   stats.py \
+# end of TEST_PROGS
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/drivers/net/ping.py 
b/tools/testing/selftests/drivers/net/ping.py
new file mode 100755
index ..2d74f15a52a0
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/ping.py
@@ -0,0 +1,32 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+from lib.py import ksft_run, KsftXfailEx
+from lib.py import NetDrvEpEnv
+from lib.py import cmd
+
+
+def ping_v4(cfg) -> None:
+if not cfg.v4:
+raise KsftXfailEx()
+
+cmd(f"ping -c 1 -W0.5 {cfg.remote_v4}")
+cmd(f"ping -c 1 -W0.5 {cfg.v4}", host=cfg.remote)
+
+
+def ping_v6(cfg) -> None:
+if not cfg.v6:
+raise KsftXfailEx()
+
+cmd(f"ping -c 1 -W0.5 {cfg.remote_v6}")
+cmd(f"ping -c 1 -W0.5 {cfg.v6}", host=cfg.remote)
+
+
+def main() -> None:
+with NetDrvEpEnv(__file__) as cfg:
+ksft_run([ping_v4, ping_v6],
+ args=(cfg, ))
+
+
+if __name__ == "__main__":
+main()
-- 
2.44.0

[PATCH net-next v2 5/6] selftests: drv-net: construct environment for running tests which require an endpoint

2024-04-15 Thread Jakub Kicinski

Nothing surprising here, hopefully. Wrap the variables from
the environment into a class or spawn a netdevsim based env
and pass it to the tests.

Signed-off-by: Jakub Kicinski 
---
 .../testing/selftests/drivers/net/README.rst  | 33 +++
 .../selftests/drivers/net/lib/py/env.py   | 98 ++-
 .../testing/selftests/net/lib/py/__init__.py  |  1 +
 tools/testing/selftests/net/lib/py/netns.py   | 31 ++
 4 files changed, 162 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/lib/py/netns.py

diff --git a/tools/testing/selftests/drivers/net/README.rst 
b/tools/testing/selftests/drivers/net/README.rst
index 5ef7c417d431..0cbab33dad1f 100644
--- a/tools/testing/selftests/drivers/net/README.rst
+++ b/tools/testing/selftests/drivers/net/README.rst
@@ -23,8 +23,41 @@ Variables can be set in the environment or by creating a 
net.config
   # Variable set in a file
   NETIF=eth0
 
+Please note that the config parser is very simple, if there are
+any non-alphanumeric characters in the value it needs to be in
+double quotes.
+
 NETIF
 ~
 
 Name of the netdevice against which the test should be executed.
 When empty or not set software devices will be used.
+
+LOCAL_V4, LOCAL_V6, REMOTE_V4, REMOTE_V6
+
+
+Local and remote endpoint IP addresses.
+
+REMOTE_TYPE
+~~~
+
+Communication method used to run commands on the remote endpoint.
+Test framework has built-in support for ``netns`` and ``ssh`` channels.
+``netns`` assumes the "remote" interface is part of the same
+host, just moved to the specified netns.
+``ssh`` communicates with remote endpoint over ``ssh`` and ``scp``.
+Using persistent SSH connections is strongly encouraged to avoid
+the latency of SSH connection setup on every command.
+
+Communication methods are defined by classes in ``lib/py/remote_{name}.py``.
+It should be possible to add a new method without modifying any of
+the framework, by simply adding an appropriately named file to ``lib/py``.
+
+REMOTE_ARGS
+~~~
+
+Arguments used to construct the communication channel.
+Communication channel dependent::
+
+  for netns - name of the "remote" namespace
+  for ssh - name/address of the remote host
diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py 
b/tools/testing/selftests/drivers/net/lib/py/env.py
index a081e168f3db..d6d1ec8f3a77 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -4,7 +4,8 @@ import os
 import shlex
 from pathlib import Path
 from lib.py import ip
-from lib.py import NetdevSimDev
+from lib.py import NetNS, NetdevSimDev
+from .remote import Remote
 
 
 def _load_env_file(src_path):
@@ -59,3 +60,98 @@ from lib.py import NetdevSimDev
 self._ns = None
 
 
+class NetDrvEpEnv:
+"""
+Class for an environment with a local device and "remote endpoint"
+which can be used to send traffic in.
+
+For local testing it creates two network namespaces and a pair
+of netdevsim devices.
+"""
+
+# Network prefixes used for local tests
+nsim_v4_pfx = "192.0.2."
+nsim_v6_pfx = "2001:db8::"
+
+def __init__(self, src_path):
+
+self.env = _load_env_file(src_path)
+
+# Things we try to destroy
+self.remote = None
+# These are for local testing state
+self._netns = None
+self._ns = None
+self._ns_peer = None
+
+if "NETIF" in self.env:
+self.dev = ip("link show dev " + self.env['NETIF'], json=True)[0]
+
+self.v4 = self.env.get("LOCAL_V4")
+self.v6 = self.env.get("LOCAL_V6")
+self.remote_v4 = self.env.get("REMOTE_V4")
+self.remote_v6 = self.env.get("REMOTE_V6")
+kind = self.env["REMOTE_TYPE"]
+args = self.env["REMOTE_ARGS"]
+else:
+self.create_local()
+
+self.dev = self._ns.nsims[0].dev
+
+self.v4 = self.nsim_v4_pfx + "1"
+self.v6 = self.nsim_v6_pfx + "1"
+self.remote_v4 = self.nsim_v4_pfx + "2"
+self.remote_v6 = self.nsim_v6_pfx + "2"
+kind = "netns"
+args = self._netns.name
+
+self.remote = Remote(kind, args)
+
+self.addr = self.v6 if self.v6 else self.v4
+self.remote_addr = self.remote_v6 if self.remote_v6 else self.remote_v4
+
+self.ifname = self.dev['ifname']
+self.ifindex = self.dev['ifindex']
+
+def create_local(self):
+self._netns = NetNS()
+self._ns = NetdevSimDev()
+self._ns_peer = NetdevSimDev(ns=self._netns)
+
+with open("/proc/self/ns/net") as nsfd0, \
+ open("/var/run/netns/" + self._netns.name) as nsfd1:
+ifi0 = self._ns.nsims[0].ifindex
+ifi1 = self._ns_peer.nsims[0].ifindex
+NetdevSimDev.ctrl_write('link_device',
+f'{nsfd0.fileno()}:{ifi0}

[PATCH net-next v2 4/6] selftests: drv-net: factor out parsing of the env

2024-04-15 Thread Jakub Kicinski

The tests with a remote end will use a different class,
for clarity, but will also need to parse the env.
So factor parsing the env out to a function.

Signed-off-by: Jakub Kicinski 
---
 .../selftests/drivers/net/lib/py/env.py   | 43 +++
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py 
b/tools/testing/selftests/drivers/net/lib/py/env.py
index e1abe9491daf..a081e168f3db 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -6,12 +6,36 @@ from pathlib import Path
 from lib.py import ip
 from lib.py import NetdevSimDev
 
+
+def _load_env_file(src_path):
+env = os.environ.copy()
+
+src_dir = Path(src_path).parent.resolve()
+if not (src_dir / "net.config").exists():
+return env
+
+lexer = shlex.shlex(open((src_dir / "net.config").as_posix(), 'r').read())
+k = None
+for token in lexer:
+if k is None:
+k = token
+env[k] = ""
+elif token == "=":
+pass
+else:
+env[k] = token
+k = None
+return env
+
+
 class NetDrvEnv:
+"""
+Class for a single NIC / host env, with no remote end
+"""
 def __init__(self, src_path):
 self._ns = None
 
-self.env = os.environ.copy()
-self._load_env_file(src_path)
+self.env = _load_env_file(src_path)
 
 if 'NETIF' in self.env:
 self.dev = ip("link show dev " + self.env['NETIF'], json=True)[0]
@@ -34,19 +58,4 @@ from lib.py import NetdevSimDev
 self._ns.remove()
 self._ns = None
 
-def _load_env_file(self, src_path):
-src_dir = Path(src_path).parent.resolve()
-if not (src_dir / "net.config").exists():
-return
 
-lexer = shlex.shlex(open((src_dir / "net.config").as_posix(), 
'r').read())
-k = None
-for token in lexer:
-if k is None:
-k = token
-self.env[k] = ""
-elif token == "=":
-pass
-else:
-self.env[k] = token
-k = None
-- 
2.44.0

[PATCH net-next v2 3/6] selftests: drv-net: define endpoint structures

2024-04-15 Thread Jakub Kicinski

Define the remote endpoint "model". To execute most meaningful device
driver tests we need to be able to communicate with a remote system,
and have it send traffic to the device under test.

Various test environments will have different requirements.

0) "Local" netdevsim-based testing can simply use net namespaces.
netdevsim supports connecting two devices now, to form a veth-like
construct.

1) Similarly on hosts with multiple NICs, the NICs may be connected
together with a loopback cable or internal device loopback.
One interface may be placed into separate netns, and tests
would proceed much like in the netdevsim case. Note that
the loopback config or the moving of one interface
into a netns is not expected to be part of selftest code.

2) Some systems may need to communicate with the remote endpoint
via SSH.

3) Last but not least environment may have its own custom communication
method.

Fundamentally we only need two operations:
 - run a command remotely
 - deploy a binary (if some tool we need is built as part of kselftests)

Wrap these two in a class. Use dynamic loading to load the Remote
class. This will allow very easy definition of other communication
methods without bothering upstream code base.

Stick to the "simple" / "no unnecessary abstractions" model for
referring to the remote endpoints. The host / remote object are
passed as an argument to the usual cmd() or ip() invocation.
For example:

 ip("link show", json=True, host=remote)

Signed-off-by: Jakub Kicinski 
---
 .../selftests/drivers/net/lib/py/__init__.py  |  1 +
 .../selftests/drivers/net/lib/py/remote.py| 13 +++
 .../drivers/net/lib/py/remote_netns.py| 15 
 .../drivers/net/lib/py/remote_ssh.py  | 34 +++
 tools/testing/selftests/net/lib/py/utils.py   | 19 ++-
 5 files changed, 73 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote.py
 create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote_netns.py
 create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote_ssh.py

diff --git a/tools/testing/selftests/drivers/net/lib/py/__init__.py 
b/tools/testing/selftests/drivers/net/lib/py/__init__.py
index 4653dffcd962..4789c1a4282d 100644
--- a/tools/testing/selftests/drivers/net/lib/py/__init__.py
+++ b/tools/testing/selftests/drivers/net/lib/py/__init__.py
@@ -15,3 +15,4 @@ KSFT_DIR = (Path(__file__).parent / "../../../..").resolve()
 sys.exit(4)
 
 from .env import *
+from .remote import Remote
diff --git a/tools/testing/selftests/drivers/net/lib/py/remote.py 
b/tools/testing/selftests/drivers/net/lib/py/remote.py
new file mode 100644
index ..d86b997d27d4
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/lib/py/remote.py
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+
+import importlib
+
+_modules = {}
+
+def Remote(kind, args):
+global _modules
+
+if kind not in _modules:
+_modules[kind] = importlib.import_module("..remote_" + kind, __name__)
+
+return getattr(_modules[kind], "Remote")(args)
diff --git a/tools/testing/selftests/drivers/net/lib/py/remote_netns.py 
b/tools/testing/selftests/drivers/net/lib/py/remote_netns.py
new file mode 100644
index ..7d8ab7a1bf92
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/lib/py/remote_netns.py
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0
+
+from lib.py import cmd
+
+
+class Remote:
+def __init__(self, name):
+self.name = name
+
+def cmd(self, *args):
+c = cmd(*args, ns=self.name)
+return c.stdout, c.stderr, c.ret
+
+def deploy(self, what):
+return what
diff --git a/tools/testing/selftests/drivers/net/lib/py/remote_ssh.py 
b/tools/testing/selftests/drivers/net/lib/py/remote_ssh.py
new file mode 100644
index ..c056b15991ff
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/lib/py/remote_ssh.py
@@ -0,0 +1,34 @@
+# SPDX-License-Identifier: GPL-2.0
+
+import os
+import shlex
+import string
+import random
+
+from lib.py import cmd
+
+
+class Remote:
+def __init__(self, name):
+self.name = name
+self._tmpdir = None
+
+def __del__(self):
+if self._tmpdir:
+self.cmd("rm -rf " + self._tmpdir)
+self._tmpdir = None
+
+def cmd(self, comm, *args):
+c = cmd("ssh " + self.name + " " + shlex.quote(comm), *args)
+return c.stdout, c.stderr, c.ret
+
+def _mktmp(self):
+return ''.join(random.choice(string.ascii_lowercase) for _ in range(8))
+
+def deploy(self, what):
+if not self._tmpdir:
+self._tmpdir = "/tmp/" + self._mktmp()
+self.cmd("mkdir " + self._tmpdir)
+file_name = self._tmpdir + "/" + self._mktmp() + os.path.basename(what)
+cmd(f"scp {what} {self.name}:{file_name}")
+return file_name
diff --git a/tools/testing/selftests/net/lib/py/utils.py 
b/tools/testing/selftests/net/lib/py/utils.py
index

[PATCH net-next v2 2/6] selftests: drv-net: add config for netdevsim

2024-04-15 Thread Jakub Kicinski

Real driver testing will obviously require enabling more
options, but will require more manual setup in the first
place. For CIs running purely software tests we need
to enable netdevsim.

Signed-off-by: Jakub Kicinski 
---
 tools/testing/selftests/drivers/net/config | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 tools/testing/selftests/drivers/net/config

diff --git a/tools/testing/selftests/drivers/net/config 
b/tools/testing/selftests/drivers/net/config
new file mode 100644
index ..f6a58ce8a230
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/config
@@ -0,0 +1,2 @@
+CONFIG_IPV6=y
+CONFIG_NETDEVSIM=m
-- 
2.44.0

[PATCH net-next v2 1/6] selftests: drv-net: add stdout to the command failed exception

2024-04-15 Thread Jakub Kicinski

ping prints all the info to stdout. To make debug easier capture
stdout in the Exception raised when command unexpectedly fails.

Signed-off-by: Jakub Kicinski 
---
 tools/testing/selftests/net/lib/py/utils.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/lib/py/utils.py 
b/tools/testing/selftests/net/lib/py/utils.py
index f0d425731fd4..19612348c30d 100644
--- a/tools/testing/selftests/net/lib/py/utils.py
+++ b/tools/testing/selftests/net/lib/py/utils.py
@@ -33,7 +33,8 @@ import subprocess
 if self.proc.returncode != 0 and fail:
 if len(stderr) > 0 and stderr[-1] == "\n":
 stderr = stderr[:-1]
-raise Exception("Command failed: %s\n%s" % (self.proc.args, 
stderr))
+raise Exception("Command failed: %s\nSTDOUT: %s\nSTDERR: %s" %
+(self.proc.args, stdout, stderr))
 
 
 def ip(args, json=None, ns=None):
-- 
2.44.0

[PATCH net-next v2 0/6] selftests: drv-net: support testing with a remote system

2024-04-15 Thread Jakub Kicinski

Hi!

Implement support for tests which require access to a remote system /
endpoint which can generate traffic.
This series concludes the "groundwork" for upstream driver tests.

I wanted to support the three models which came up in discussions:
 - SW testing with netdevsim
 - "local" testing with two ports on the same system in a loopback
 - "remote" testing via SSH
so there is a tiny bit of an abstraction which wraps up how "remote"
commands are executed. Otherwise hopefully there's nothing surprising.

I'm only adding a ping test. I had a bigger one written but I was
worried we'll get into discussing the details of the test itself
and how I chose to hack up netdevsim, instead of the test infra...
So that test will be a follow up :)

v2:
 - rename endpoint -> remote
 - use 2001:db8:: v6 prefix
 - add a note about persistent SSH connections
 - add the kernel config
v1: https://lore.kernel.org/all/20240412233705.1066444-1-k...@kernel.org

Jakub Kicinski (6):
  selftests: drv-net: add stdout to the command failed exception
  selftests: drv-net: add config for netdevsim
  selftests: drv-net: define endpoint structures
  selftests: drv-net: factor out parsing of the env
  selftests: drv-net: construct environment for running tests which
require an endpoint
  selftests: drv-net: add a trivial ping test

 tools/testing/selftests/drivers/net/Makefile  |   5 +-
 .../testing/selftests/drivers/net/README.rst  |  33 
 tools/testing/selftests/drivers/net/config|   2 +
 .../selftests/drivers/net/lib/py/__init__.py  |   1 +
 .../selftests/drivers/net/lib/py/env.py   | 141 +++---
 .../selftests/drivers/net/lib/py/remote.py|  13 ++
 .../drivers/net/lib/py/remote_netns.py|  15 ++
 .../drivers/net/lib/py/remote_ssh.py  |  34 +
 tools/testing/selftests/drivers/net/ping.py   |  32 
 .../testing/selftests/net/lib/py/__init__.py  |   1 +
 tools/testing/selftests/net/lib/py/netns.py   |  31 
 tools/testing/selftests/net/lib/py/utils.py   |  22 +--
 12 files changed, 301 insertions(+), 29 deletions(-)
 create mode 100644 tools/testing/selftests/drivers/net/config
 create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote.py
 create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote_netns.py
 create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote_ssh.py
 create mode 100755 tools/testing/selftests/drivers/net/ping.py
 create mode 100644 tools/testing/selftests/net/lib/py/netns.py

-- 
2.44.0

Re: [PATCH v10 3/5] selftest mm/mseal memory sealing

2024-04-15 Thread Kees Cook

On Mon, Apr 15, 2024 at 01:27:32PM -0700, Jeff Xu wrote:
> On Mon, Apr 15, 2024 at 11:32 AM Muhammad Usama Anjum
>  wrote:
> >
> > Please fix following for this and fifth patch as well:
> >
> > --> checkpatch.pl --codespell tools/testing/selftests/mm/mseal_test.c
> >
> > WARNING: Macros with flow control statements should be avoided
> > #42: FILE: tools/testing/selftests/mm/mseal_test.c:42:
> > +#define FAIL_TEST_IF_FALSE(c) do {\
> > +   if (!(c)) {\
> > +   ksft_test_result_fail("%s, line:%d\n", __func__,
> > __LINE__);\
> > +   goto test_end;\
> > +   } \
> > +   } \
> > +   while (0)
> >
> > WARNING: Macros with flow control statements should be avoided
> > #50: FILE: tools/testing/selftests/mm/mseal_test.c:50:
> > +#define SKIP_TEST_IF_FALSE(c) do {\
> > +   if (!(c)) {\
> > +   ksft_test_result_skip("%s, line:%d\n", __func__,
> > __LINE__);\
> > +   goto test_end;\
> > +   } \
> > +   } \
> > +   while (0)
> >
> > WARNING: Macros with flow control statements should be avoided
> > #59: FILE: tools/testing/selftests/mm/mseal_test.c:59:
> > +#define TEST_END_CHECK() {\
> > +   ksft_test_result_pass("%s\n", __func__);\
> > +   return;\
> > +test_end:\
> > +   return;\
> > +}
> >
> I tried to fix those warnings of checkpatch in the past, but no good
> solution. If I put the condition check in the test, the code will have
> too many "if" and decrease readability.  If there is a better
> solution, I'm happy to do that, suggestions are welcome.

Yeah, these are more "conventions" from checkpatch. I think it's fine to
ignore this warning, especially for selftests.

-- 
Kees Cook

Re: [PATCH V2] KVM: selftests: Take large C-state exit latency into consideration

2024-04-15 Thread Sean Christopherson

On Fri, Apr 12, 2024, Zide Chen wrote:
> Currently, the migration worker delays 1-10 us, assuming that one
> KVM_RUN iteration only takes a few microseconds.  But if C-state exit
> latencies are large enough, for example, hundreds or even thousands
> of microseconds on server CPUs, it may happen that it's not able to
> bring the target CPU out of C-state before the migration worker starts
> to migrate it to the next CPU.
> 
> If the system workload is light, most CPUs could be at a certain level
> of C-state, which may result in less successful migrations and fail the
> migration/KVM_RUN ratio sanity check.
> 
> This patch adds a command line option to skip the sanity check in
> this case.
> 
> Additionally, seems it's reasonable to randomize the length of usleep(),
> other than delay in a fixed pattern.

This belongs in a separate patch.  And while it's reasonable on the surface, I
doubt think it buys us anything, and it makes an already non-deterministic test
even less deterministic.  In other words, unless a random sleep time helps find
more bugs or finds the original bug faster, just drop the randomization.

> V2:
> - removed the busy loop implementation
> - add the new "-s" option

This belongs in the ignored part of the patch...
> 
> Signed-off-by: Zide Chen 

...down here.

> ---
>  tools/testing/selftests/kvm/rseq_test.c | 37 +++--
>  1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/rseq_test.c 
> b/tools/testing/selftests/kvm/rseq_test.c
> index 28f97fb52044..515cfa32a925 100644
> --- a/tools/testing/selftests/kvm/rseq_test.c
> +++ b/tools/testing/selftests/kvm/rseq_test.c
> @@ -150,7 +150,7 @@ static void *migration_worker(void *__rseq_tid)
>* Use usleep() for simplicity and to avoid unnecessary kernel
>* dependencies.
>*/
> - usleep((i % 10) + 1);
> + usleep((rand() % 10) + 1);
>   }
>   done = true;
>   return NULL;
> @@ -186,12 +186,35 @@ static void calc_min_max_cpu(void)
>  "Only one usable CPU, task migration not possible");
>  }
>  
> +static void usage(const char *name)

Uber nit, "help()" is more common than "usage()".

> @@ -254,9 +279,15 @@ int main(int argc, char *argv[])
>* getcpu() to stabilize.  A 2:1 migration:KVM_RUN ratio is a fairly
>* conservative ratio on x86-64, which can do _more_ KVM_RUNs than
>* migrations given the 1us+ delay in the migration task.
> +  *
> +  * Another reason why it may have small migration:KVM_RUN ratio is that,
> +  * on systems with large C-state exit latency, it may happen quite often
> +  * that the scheduler is not able to wake up the target CPU before the
> +  * vCPU thread is scheduled to another CPU.
>*/
> - TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
> - "Only performed %d KVM_RUNs, task stalled too much?", i);
> + TEST_ASSERT(skip_sanity_check || i > (NR_TASK_MIGRATIONS / 2),
> + "Only performed %d KVM_RUNs, task stalled too much? "
> + "Try to turn off C-states or run it with the -s option", i);

I think it's worth explicitly telling the user how to reduce CPU wakeup latency.
Also, are C-states called that on other architectures?  E.g. maybe this to avoid
confusing the user?  Not a big deal, e.g. I've no objection whatsoever to the
comment, but it seems easy enough to avoid confusing the user.

"Try setting /dev/cpu_dma_latency to reduce CPU wakeup 
latency, "
"or run with -s to skip this sanity check", i);

Re: [PATCH v4 01/14] lib: Add TLV parser

2024-04-15 Thread Randy Dunlap




On 4/15/24 12:19 PM, Jarkko Sakkinen wrote:
> On Mon Apr 15, 2024 at 5:24 PM EEST, Roberto Sassu wrote:
>> From: Roberto Sassu 
>>
>> Add a parser of a generic TLV format:
> 
> What is TLV?

type-length-value

i.e., a descriptor that contains a value.

IIUC.

-- 
#Randy
https://people.kernel.org/tglx/notes-about-netiquette
https://subspace.kernel.org/etiquette.html

Re: [PATCH v10 3/5] selftest mm/mseal memory sealing

2024-04-15 Thread Jeff Xu

On Mon, Apr 15, 2024 at 11:32 AM Muhammad Usama Anjum
 wrote:
>
> Please fix following for this and fifth patch as well:
>
> --> checkpatch.pl --codespell tools/testing/selftests/mm/mseal_test.c
>
> WARNING: Macros with flow control statements should be avoided
> #42: FILE: tools/testing/selftests/mm/mseal_test.c:42:
> +#define FAIL_TEST_IF_FALSE(c) do {\
> +   if (!(c)) {\
> +   ksft_test_result_fail("%s, line:%d\n", __func__,
> __LINE__);\
> +   goto test_end;\
> +   } \
> +   } \
> +   while (0)
>
> WARNING: Macros with flow control statements should be avoided
> #50: FILE: tools/testing/selftests/mm/mseal_test.c:50:
> +#define SKIP_TEST_IF_FALSE(c) do {\
> +   if (!(c)) {\
> +   ksft_test_result_skip("%s, line:%d\n", __func__,
> __LINE__);\
> +   goto test_end;\
> +   } \
> +   } \
> +   while (0)
>
> WARNING: Macros with flow control statements should be avoided
> #59: FILE: tools/testing/selftests/mm/mseal_test.c:59:
> +#define TEST_END_CHECK() {\
> +   ksft_test_result_pass("%s\n", __func__);\
> +   return;\
> +test_end:\
> +   return;\
> +}
>
I tried to fix those warnings of checkpatch in the past, but no good
solution. If I put the condition check in the test, the code will have
too many "if" and decrease readability.  If there is a better
solution, I'm happy to do that, suggestions are welcome.

>
> On 4/15/24 9:35 PM, jef...@chromium.org wrote:
> > From: Jeff Xu 
> >
> > selftest for memory sealing change in mmap() and mseal().
> >
> > Signed-off-by: Jeff Xu 
> > ---
> >  tools/testing/selftests/mm/.gitignore   |1 +
> >  tools/testing/selftests/mm/Makefile |1 +
> >  tools/testing/selftests/mm/mseal_test.c | 1836 +++
> >  3 files changed, 1838 insertions(+)
> >  create mode 100644 tools/testing/selftests/mm/mseal_test.c
> >
> > diff --git a/tools/testing/selftests/mm/.gitignore 
> > b/tools/testing/selftests/mm/.gitignore
> > index d26e962f2ac4..98eaa4590f11 100644
> > --- a/tools/testing/selftests/mm/.gitignore
> > +++ b/tools/testing/selftests/mm/.gitignore
> > @@ -47,3 +47,4 @@ mkdirty
> >  va_high_addr_switch
> >  hugetlb_fault_after_madv
> >  hugetlb_madv_vs_map
> > +mseal_test
> > diff --git a/tools/testing/selftests/mm/Makefile 
> > b/tools/testing/selftests/mm/Makefile
> > index eb5f39a2668b..95d10fe1b3c1 100644
> > --- a/tools/testing/selftests/mm/Makefile
> > +++ b/tools/testing/selftests/mm/Makefile
> > @@ -59,6 +59,7 @@ TEST_GEN_FILES += mlock2-tests
> >  TEST_GEN_FILES += mrelease_test
> >  TEST_GEN_FILES += mremap_dontunmap
> >  TEST_GEN_FILES += mremap_test
> > +TEST_GEN_FILES += mseal_test
> >  TEST_GEN_FILES += on-fault-limit
> >  TEST_GEN_FILES += pagemap_ioctl
> >  TEST_GEN_FILES += thuge-gen
> > diff --git a/tools/testing/selftests/mm/mseal_test.c 
> > b/tools/testing/selftests/mm/mseal_test.c
> > new file mode 100644
> > index ..06c780d1d8e5
> > --- /dev/null
> > +++ b/tools/testing/selftests/mm/mseal_test.
> > +static void __write_pkey_reg(u64 pkey_reg)
> > +{
> > +#if defined(__i386__) || defined(__x86_64__) /* arch */
> > + unsigned int eax = pkey_reg;
> > + unsigned int ecx = 0;
> > + unsigned int edx = 0;
> > +
> > + asm volatile(".byte 0x0f,0x01,0xef\n\t"
> > + : : "a" (eax), "c" (ecx), "d" (edx));
> > + assert(pkey_reg == __read_pkey_reg());
> Use ksft_exit_fail_msg instead of assert to stay inside TAP format if
> condition is false and error is generated.
>
I can remove the usage of assert() from the test.

> > +int main(int argc, char **argv)
> > +{
> > + bool test_seal = seal_support();
> > +
> > + ksft_print_header();
> > +
> > + if (!test_seal)
> > + ksft_exit_skip("sealing not supported, check CONFIG_64BIT\n");
> > +
> > + if (!pkey_supported())
> > + ksft_print_msg("PKEY not supported\n");
> > +
> > + ksft_set_plan(80);
> > +
> > + test_seal_addseal();
> > + test_seal_unmapped_start();
> > + test_seal_unmapped_middle();
> > + test_seal_unmapped_end();
> > + test_seal_multiple_vmas();
> > + test_seal_split_start();
> > + test_seal_split_end();
> > + test_seal_invalid_input();
> > + test_seal_zero_length();
> > + test_seal_twice();
> > +
> > + test_seal_mprotect(false);
> > + test_seal_mprotect(true);
> > +
> > + test_seal_start_mprotect(false);
> > + test_seal_start_mprotect(true);
> > +
> > + test_seal_end_mprotect(false);
> > + test_seal_end_mprotect(true);
> > +
> > + test_seal_mprotect_unalign_len(false);
> > + test_seal_mprotect_unalign_len(true);
> > +
> > + test_seal_mprotect_unalign_len_variant_2(false);
> > + test_seal_mprotect_unalign_len_variant_2(true);
> > +
> > + test_seal_mprotect_two_vma(false);
> > + test_seal_mprotect_two_vma(true);
> > +
> > +

Re: [PATCH net-next 1/5] selftests: drv-net: define endpoint structures

2024-04-15 Thread Petr Machata

Willem de Bruijn  writes:

> 1. Cleaning up remote state in all conditions, including timeout/kill.
>
>Some tests require a setup phase before the test, and a matching
>cleanup phase. If any of the configured state is variable (even
>just a randomized filepath) this needs to be communicated to the
>cleanup phase. The remote filepath is handled well here. But if
>a test needs per-test setup? Say, change MTU or an Ethtool feature.
>Multiple related tests may want to share a setup/cleanup.

Personally I like to wrap responsibilities of this sort in context
managers, e.g. something along these lines:

class changed_mtu:
def __init__(self, dev, mtu):
self.dev = dev
self.mtu = mtu

def __enter__(self):
js = cmd(f"ip -j link show dev {self.dev}", json=True)
self.orig_mtu = something_something(js)
cmd(f"ip link set dev {self.dev} mtu {self.mtu}")

def __exit__(self, type, value, traceback):
cmd(f"ip link set dev {self.dev} mtu {self.orig_mtu}")

with changed_mtu(swp1, 1):
   # MTU is 10K here
# and back to 1500

A lot of this can be made generic, where some object is given a setup /
cleanup commands and just invokes those. But things like MTU, ethtool
speed, sysctls and what have you that need to save a previous state and
revert back to it will probably need a custom handler. Like we have them
in lib.sh as well.

Re: [PATCH v4 13/14] selftests/digest_cache: Add selftests for digest_cache LSM

2024-04-15 Thread Jarkko Sakkinen

On Mon Apr 15, 2024 at 5:24 PM EEST, Roberto Sassu wrote:
> From: Roberto Sassu 
>
> Add tests to verify the correctness of the digest_cache LSM, in all_test.c.
>
> Add the kernel module digest_cache_kern.ko, to let all_test call the API
> of the digest_cache LSM through the newly introduced digest_cache_test file
> in securityfs.
>
> Test coverage information:
>
> File 'security/digest_cache/notifier.c'
> Lines executed:100.00% of 31
> File 'security/digest_cache/reset.c'
> Lines executed:98.36% of 61
> File 'security/digest_cache/main.c'
> Lines executed:90.29% of 206
> File 'security/digest_cache/modsig.c'
> Lines executed:42.86% of 21
> File 'security/digest_cache/htable.c'
> Lines executed:93.02% of 86
> File 'security/digest_cache/populate.c'
> Lines executed:92.86% of 56
> File 'security/digest_cache/verif.c'
> Lines executed:89.74% of 39
> File 'security/digest_cache/dir.c'
> Lines executed:90.62% of 96
> File 'security/digest_cache/secfs.c'
> Lines executed:57.14% of 21
> File 'security/digest_cache/parsers/tlv.c'
> Lines executed:79.75% of 79
> File 'security/digest_cache/parsers/rpm.c'
> Lines executed:88.46% of 78
>
> Signed-off-by: Roberto Sassu 
> ---
>  MAINTAINERS   |   1 +
>  tools/testing/selftests/Makefile  |   1 +
>  .../testing/selftests/digest_cache/.gitignore |   3 +
>  tools/testing/selftests/digest_cache/Makefile |  24 +
>  .../testing/selftests/digest_cache/all_test.c | 815 ++
>  tools/testing/selftests/digest_cache/common.c |  78 ++
>  tools/testing/selftests/digest_cache/common.h | 135 +++
>  .../selftests/digest_cache/common_user.c  |  47 +
>  .../selftests/digest_cache/common_user.h  |  17 +
>  tools/testing/selftests/digest_cache/config   |   1 +
>  .../selftests/digest_cache/generators.c   | 248 ++
>  .../selftests/digest_cache/generators.h   |  19 +
>  .../selftests/digest_cache/testmod/Makefile   |  16 +
>  .../selftests/digest_cache/testmod/kern.c | 564 
>  14 files changed, 1969 insertions(+)
>  create mode 100644 tools/testing/selftests/digest_cache/.gitignore
>  create mode 100644 tools/testing/selftests/digest_cache/Makefile
>  create mode 100644 tools/testing/selftests/digest_cache/all_test.c
>  create mode 100644 tools/testing/selftests/digest_cache/common.c
>  create mode 100644 tools/testing/selftests/digest_cache/common.h
>  create mode 100644 tools/testing/selftests/digest_cache/common_user.c
>  create mode 100644 tools/testing/selftests/digest_cache/common_user.h
>  create mode 100644 tools/testing/selftests/digest_cache/config
>  create mode 100644 tools/testing/selftests/digest_cache/generators.c
>  create mode 100644 tools/testing/selftests/digest_cache/generators.h
>  create mode 100644 tools/testing/selftests/digest_cache/testmod/Makefile
>  create mode 100644 tools/testing/selftests/digest_cache/testmod/kern.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 72801a88449c..d7f700da009e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6198,6 +6198,7 @@ M:  Roberto Sassu 
>  L:   linux-security-mod...@vger.kernel.org
>  S:   Maintained
>  F:   security/digest_cache/
> +F:   tools/testing/selftests/digest_cache/
>  
A common convetion is to have one patch with MAINTAINERS update in the
tail. This is now sprinkled to multiple patches which is not good.

>  DIGITEQ AUTOMOTIVE MGB4 V4L2 DRIVER
>  M:   Martin Tuma 
> diff --git a/tools/testing/selftests/Makefile 
> b/tools/testing/selftests/Makefile
> index 15b6a111c3be..3c5965a62d28 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -13,6 +13,7 @@ TARGETS += core
>  TARGETS += cpufreq
>  TARGETS += cpu-hotplug
>  TARGETS += damon
> +TARGETS += digest_cache
>  TARGETS += dmabuf-heaps
>  TARGETS += drivers/dma-buf
>  TARGETS += drivers/s390x/uvdevice
> diff --git a/tools/testing/selftests/digest_cache/.gitignore 
> b/tools/testing/selftests/digest_cache/.gitignore
> new file mode 100644
> index ..392096e18f4e
> --- /dev/null
> +++ b/tools/testing/selftests/digest_cache/.gitignore
> @@ -0,0 +1,3 @@
> +/*.mod
> +/*_test
> +/*.ko
> diff --git a/tools/testing/selftests/digest_cache/Makefile 
> b/tools/testing/selftests/digest_cache/Makefile
> new file mode 100644
> index ..6b1e0d3c08cf
> --- /dev/null
> +++ b/tools/testing/selftests/digest_cache/Makefile
> @@ -0,0 +1,24 @@
> +# SPDX-License-Identifier: GPL-2.0
> +TEST_GEN_PROGS_EXTENDED = digest_cache_kern.ko
> +TEST_GEN_PROGS := all_test
> +
> +$(OUTPUT)/%.ko: $(wildcard common.[ch]) testmod/Makefile testmod/kern.c
> + $(call msg,MOD,,$@)
> + $(Q)$(MAKE) -C testmod
> + $(Q)cp testmod/digest_cache_kern.ko $@
> +
> +LOCAL_HDRS += common.h common_user.h generators.h
> +CFLAGS += -ggdb -Wall -Wextra $(KHDR_INCLUDES)
> +
> +OVERRIDE_TARGETS := 1
> +override define CLEAN
> + $(call msg,CLEAN)
> + $(Q)$(MAKE) -C testmod clean
> + rm -Rf $(TEST_GEN_PROGS)
> + rm -Rf

Re: [PATCH v4 11/14] digest_cache: Reset digest cache on file/directory change

2024-04-15 Thread Jarkko Sakkinen

On Mon Apr 15, 2024 at 5:24 PM EEST, Roberto Sassu wrote:
> From: Roberto Sassu 
>
> Register six new LSM hooks, path_truncate, file_release, inode_unlink,
> inode_rename, inode_post_setxattr and inode_post_removexattr, to monitor
> digest lists/directory modifications.
>
> If an action affects a digest list or the parent directory, the new LSM
> hook implementations call digest_cache_reset_owner() to set the RESET bit
> (if unset) on the digest cache pointed to by dig_owner in the inode
> security blob. This will cause next calls to digest_cache_get() and
> digest_cache_create() to respectively put and clear dig_user and dig_owner,
> and request a new digest cache.
>
> If an action affects a file using a digest cache, the new LSM hook
> implementations call digest_cache_reset_user() to set the RESET_USER bit
> (if unset) on the digest cache pointed to by dig_user in the inode security
> blob. This will cause next calls to digest_cache_get() to put and clear
> dig_user, and retrieve the digest cache again.
>
> That does not affect other users of the old digest cache, since that one
> remains valid as long as the reference count is greater than zero. However,
> they will be notified in a subsequent patch about the reset, so that they
> can eventually request a new digest cache.
>
> Recreating a file digest cache means reading the digest list again and
> extracting the digests. Recreating a directory digest cache, instead, does
> not mean recreating the digest cache for directory entries, since those
> digest caches are likely already stored in the inode security blob. It
> would happen however for new files.
>
> Dig_owner reset for file digest caches is done on path_truncate, when a
> digest list is truncated (there is no inode_truncate, file_truncate does
> not catch operations through the truncate() system call), file_release,
> when a digest list opened for write is being closed, inode_unlink, when a
> digest list is removed, and inode_rename when a digest list is renamed.
>
> Dig_owner reset for directory digest caches is done on file_release, when a
> new digest list is written in the digest list directory, on inode_unlink,
> when a digest list is deleted from that directory, and finally on
> inode_rename, when a digest list is moved to/from that directory.
>
> Dig_user reset is always done on inode_post_setxattr and
> inode_post_removexattr, when the security.digest_list xattr is respectively
> set or removed from a file using a digest cache.
>
> With the exception of file_release, which will always be executed (cannot
> be denied), and inode_post_setxattr and inode_post_removexattr, which are
> executed after the actual operation, the other LSM hooks are not optimal,
> since the digest_cache LSM does not know whether or not the operation will
> be allowed also by other LSMs. If the operation is denied, the digest_cache
> LSM would do an unnecessary reset.
>
> Signed-off-by: Roberto Sassu 
> ---
>  security/digest_cache/Kconfig|   1 +
>  security/digest_cache/Makefile   |   3 +-
>  security/digest_cache/dir.c  |   6 +
>  security/digest_cache/internal.h |  14 +++
>  security/digest_cache/main.c |  19 +++
>  security/digest_cache/reset.c| 197 +++
>  6 files changed, 239 insertions(+), 1 deletion(-)
>  create mode 100644 security/digest_cache/reset.c
>
> diff --git a/security/digest_cache/Kconfig b/security/digest_cache/Kconfig
> index cb4fa44e8f2a..54ba3a585073 100644
> --- a/security/digest_cache/Kconfig
> +++ b/security/digest_cache/Kconfig
> @@ -2,6 +2,7 @@
>  config SECURITY_DIGEST_CACHE
>   bool "Digest_cache LSM"
>   select TLV_PARSER
> + select SECURITY_PATH
>   default n
>   help
> This option enables an LSM maintaining a cache of digests
> diff --git a/security/digest_cache/Makefile b/security/digest_cache/Makefile
> index e417da0383ab..3d5e600a2c45 100644
> --- a/security/digest_cache/Makefile
> +++ b/security/digest_cache/Makefile
> @@ -4,7 +4,8 @@
>  
>  obj-$(CONFIG_SECURITY_DIGEST_CACHE) += digest_cache.o
>  
> -digest_cache-y := main.o secfs.o htable.o populate.o modsig.o verif.o dir.o
> +digest_cache-y := main.o secfs.o htable.o populate.o modsig.o verif.o dir.o \
> +   reset.o
>  
>  digest_cache-y += parsers/tlv.o
>  digest_cache-y += parsers/rpm.o
> diff --git a/security/digest_cache/dir.c b/security/digest_cache/dir.c
> index a7d203c15386..937177660242 100644
> --- a/security/digest_cache/dir.c
> +++ b/security/digest_cache/dir.c
> @@ -148,6 +148,12 @@ digest_cache_dir_lookup_digest(struct dentry *dentry,
>  
>   list_for_each_entry(dir_entry, _cache->dir_entries, list) {
>   mutex_lock(_entry->digest_cache_mutex);
> + if (dir_entry->digest_cache &&
> + test_bit(RESET, _entry->digest_cache->flags)) {
> + digest_cache_put(dir_entry->digest_cache);
> + dir_entry->digest_cache = NULL;
> + }
> +
>

Re: [PATCH v4 10/14] digest cache: Prefetch digest lists if requested

2024-04-15 Thread Jarkko Sakkinen

On Mon Apr 15, 2024 at 5:24 PM EEST, Roberto Sassu wrote:
> From: Roberto Sassu 
>
> A desirable goal when doing integrity measurements is that they are done
> always in the same order across boots, so that the resulting PCR value
> becomes predictable and suitable for sealing policies. However, due to
> parallel execution of system services at boot, a deterministic order of
> measurements is difficult to achieve.
>
> The digest_cache LSM is not exempted from this issue. Under the assumption
> that only the digest list is measured, and file measurements are omitted if
> their digest is found in that digest list, a PCR can be predictable only if
> all files belong to the same digest list. Otherwise, it will still be
> unpredictable, since files accessed in a non-deterministic order will cause
> digest lists to be measured in a non-deterministic order too.
>
> Overcome this issue, if prefetching is enabled, by searching a digest list
> file name in digest_list_dir_lookup_filename() among the entries of the
> linked list built by digest_cache_dir_create(). If the file name does not
> match, read the digest list to trigger its measurement. Otherwise, also
> create a digest cache and return that to the caller. Release the extra
> reference of the directory digest cache in digest_cache_new(), since it was
> only used for the search and it is not going to be returned.
>
> Prefetching needs to be explicitly enabled by setting the new
> security.dig_prefetch xattr to 1 in the directory containing the digest
> lists. The newly introduced function digest_cache_prefetch_requested()
> checks first if the DIR_PREFETCH bit is set in dig_owner, otherwise it
> reads the xattr. digest_cache_create() sets DIR_PREFETCH in dig_owner, if
> prefetching is enabled, before declaring the digest cache as initialized.
>
> Signed-off-by: Roberto Sassu 
> ---
>  include/uapi/linux/xattr.h   |  3 +
>  security/digest_cache/dir.c  | 55 +-
>  security/digest_cache/internal.h | 11 +++-
>  security/digest_cache/main.c | 95 +++-
>  security/digest_cache/populate.c |  8 ++-
>  security/digest_cache/verif.c|  5 +-
>  6 files changed, 170 insertions(+), 7 deletions(-)
>
> diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h
> index 8a58cf4bce65..8af33d38d9e8 100644
> --- a/include/uapi/linux/xattr.h
> +++ b/include/uapi/linux/xattr.h
> @@ -57,6 +57,9 @@
>  #define XATTR_DIGEST_LIST_SUFFIX "digest_list"
>  #define XATTR_NAME_DIGEST_LIST XATTR_SECURITY_PREFIX XATTR_DIGEST_LIST_SUFFIX
>  
> +#define XATTR_DIG_PREFETCH_SUFFIX "dig_prefetch"
> +#define XATTR_NAME_DIG_PREFETCH XATTR_SECURITY_PREFIX 
> XATTR_DIG_PREFETCH_SUFFIX
> +
>  #define XATTR_SELINUX_SUFFIX "selinux"
>  #define XATTR_NAME_SELINUX XATTR_SECURITY_PREFIX XATTR_SELINUX_SUFFIX
>  
> diff --git a/security/digest_cache/dir.c b/security/digest_cache/dir.c
> index 7bfcdd5f7ef1..a7d203c15386 100644
> --- a/security/digest_cache/dir.c
> +++ b/security/digest_cache/dir.c
> @@ -54,6 +54,7 @@ static bool digest_cache_dir_iter(struct dir_context 
> *__ctx, const char *name,
>   new_entry->seq_num = UINT_MAX;
>   new_entry->digest_cache = NULL;
>   mutex_init(_entry->digest_cache_mutex);
> + new_entry->prefetched = false;
>  
>   if (new_entry->name[0] < '0' || new_entry->name[0] > '9')
>   goto out;
> @@ -127,6 +128,7 @@ int digest_cache_dir_create(struct digest_cache 
> *digest_cache,
>   * @digest_cache: Digest cache
>   * @digest: Digest to search
>   * @algo: Algorithm of the digest to search
> + * @filename: File name of the digest list to search
>   *
>   * This function iterates over the linked list created by
>   * digest_cache_dir_create() and looks up the digest in the digest cache of
> @@ -149,7 +151,8 @@ digest_cache_dir_lookup_digest(struct dentry *dentry,
>   if (!dir_entry->digest_cache) {
>   cache = digest_cache_create(dentry, digest_list_path,
>   digest_cache->path_str,
> - dir_entry->name);
> + dir_entry->name, false,
> + false);
>   /* Ignore digest caches that cannot be instantiated. */
>   if (!cache) {
>   mutex_unlock(_entry->digest_cache_mutex);
> @@ -158,6 +161,8 @@ digest_cache_dir_lookup_digest(struct dentry *dentry,
>  
>   /* Consume extra ref. from digest_cache_create(). */
>   dir_entry->digest_cache = cache;
> + /* Digest list was read, mark entry as prefetched. */
> + dir_entry->prefetched = true;
>   }
>   mutex_unlock(_entry->digest_cache_mutex);
>  
> @@ -171,6 +176,54 @@ digest_cache_dir_lookup_digest(struct dentry *dentry,
>   return 0UL;
>  }
>  
> +/**

Re: [PATCH v3 04/29] riscv: zicfilp / zicfiss in dt-bindings (extensions.yaml)

2024-04-15 Thread Rob Herring

On Wed, Apr 10, 2024 at 02:37:21PM -0700, Deepak Gupta wrote:
> On Wed, Apr 10, 2024 at 4:58 AM Rob Herring  wrote:
> >
> > On Wed, Apr 03, 2024 at 04:34:52PM -0700, Deepak Gupta wrote:
> > > Make an entry for cfi extensions in extensions.yaml.
> > >
> > > Signed-off-by: Deepak Gupta 
> > > ---
> > >  .../devicetree/bindings/riscv/extensions.yaml  | 10 ++
> > >  1 file changed, 10 insertions(+)
> > >
> > > diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml 
> > > b/Documentation/devicetree/bindings/riscv/extensions.yaml
> > > index 63d81dc895e5..45b87ad6cc1c 100644
> > > --- a/Documentation/devicetree/bindings/riscv/extensions.yaml
> > > +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
> > > @@ -317,6 +317,16 @@ properties:
> > >  The standard Zicboz extension for cache-block zeroing as 
> > > ratified
> > >  in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
> > >
> > > +- const: zicfilp
> > > +  description:
> > > +The standard Zicfilp extension for enforcing forward edge 
> > > control-flow
> > > +integrity in commit 3a20dc9 of riscv-cfi and is in public 
> > > review.
> >
> > Does in public review mean the commit sha is going to change?
> >
> 
> Less likely. Next step after public review is to gather comments from
> public review.
> If something is really pressing and needs to be addressed, then yes
> this will change.
> Else this gets ratified as it is.

If the commit sha can change, then it is useless. What's the guarantee 
someone is going to remember to update it if it changes?

Rob

Re: [PATCH v4 09/14] digest_cache: Add support for directories

2024-04-15 Thread Jarkko Sakkinen

On Mon Apr 15, 2024 at 5:24 PM EEST, Roberto Sassu wrote:
> From: Roberto Sassu 
>
> In the environments where xattrs are not available (e.g. in the initial ram
> disk), the digest_cache LSM cannot precisely determine which digest list in
> a directory contains the desired reference digest. However, although
> slower, it would be desirable to search the digest in all digest lists of
> that directory.
>
> This done in two steps. When a digest cache is being created,
> digest_cache_create() invokes digest_cache_dir_create(), to generate the
> list of current directory entries. Entries are placed in the list in
> ascending order by the  if prepended to the file name, or at the
> end of the list if not.
>
> The resulting digest cache has the IS_DIR bit set, to distinguish it from
> the digest caches created from regular files.
>
> Second, when a digest is searched in a directory digest cache,
> digest_cache_lookup() invokes digest_cache_dir_lookup_digest() to
> iteratively search that digest in each directory entry generated by
> digest_cache_dir_create().
>
> That list is stable, even if new files are added or deleted from that
> directory. A subsequent patch will invalidate the digest cache, forcing
> next callers of digest_cache_get() to get a new directory digest cache with
> the updated list of directory entries.
>
> If the current directory entry does not have a digest cache reference,
> digest_cache_dir_lookup_digest() invokes digest_cache_create() to create a
> new digest cache for that entry. In either case,
> digest_cache_dir_lookup_digest() calls then digest_cache_htable_lookup()
> with the new/existing digest cache to search the digest. Check and
> assignment of the digest cache in a directory entry is protected by the
> per entry digest_cache_mutex.
>
> The iteration stops when the digest is found. In that case,
> digest_cache_dir_lookup_digest() returns the digest cache reference of the
> current directory entry as the digest_cache_found_t type, so that callers
> of digest_cache_lookup() don't mistakenly try to call digest_cache_put()
> with that reference.
>
> This new reference type will be used to retrieve information about the
> digest cache containing the digest, which is not known in advance until the
> digest search is performed.
>
> The order of the list of directory entries influences the speed of the
> digest search. A search terminates faster if less digest caches have to be
> created. One way to optimize it could be to order the list of digest lists
> in the same way of when they are requested at boot.
>
> Finally, digest_cache_dir_free() releases the digest cache references
> stored in the list of directory entries, and frees the list itself.
>
> Signed-off-by: Roberto Sassu 
> ---
>  security/digest_cache/Makefile   |   2 +-
>  security/digest_cache/dir.c  | 193 +++
>  security/digest_cache/htable.c   |  22 +++-
>  security/digest_cache/internal.h |  45 +++
>  security/digest_cache/main.c |  12 ++
>  5 files changed, 271 insertions(+), 3 deletions(-)
>  create mode 100644 security/digest_cache/dir.c
>
> diff --git a/security/digest_cache/Makefile b/security/digest_cache/Makefile
> index 37a473c7bc28..e417da0383ab 100644
> --- a/security/digest_cache/Makefile
> +++ b/security/digest_cache/Makefile
> @@ -4,7 +4,7 @@
>  
>  obj-$(CONFIG_SECURITY_DIGEST_CACHE) += digest_cache.o
>  
> -digest_cache-y := main.o secfs.o htable.o populate.o modsig.o verif.o
> +digest_cache-y := main.o secfs.o htable.o populate.o modsig.o verif.o dir.o
>  
>  digest_cache-y += parsers/tlv.o
>  digest_cache-y += parsers/rpm.o
> diff --git a/security/digest_cache/dir.c b/security/digest_cache/dir.c
> new file mode 100644
> index ..7bfcdd5f7ef1
> --- /dev/null
> +++ b/security/digest_cache/dir.c
> @@ -0,0 +1,193 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2023-2024 Huawei Technologies Duesseldorf GmbH
> + *
> + * Author: Roberto Sassu 
> + *
> + * Manage digest caches from directories.
> + */
> +
> +#define pr_fmt(fmt) "DIGEST CACHE: "fmt
> +#include 
> +
> +#include "internal.h"
> +
> +/**
> + * digest_cache_dir_iter - Digest cache directory iterator
> + * @__ctx: iterate_dir() context
> + * @name: Name of file in the accessed directory
> + * @namelen: String length of @name
> + * @offset: Current position in the directory stream (see man readdir)
> + * @ino: Inode number
> + * @d_type: File type
> + *
> + * This function stores the names of the files in the containing directory in
> + * a linked list. If they are in the format --, this
> + * function orders them by seq num, so that digest lists are processed in the
> + * desired order. Otherwise, if - is not included, it adds the name 
> at
> + * the end of the list.
> + *
> + * Return: True to continue processing, false to stop.
> + */
> +static bool digest_cache_dir_iter(struct dir_context *__ctx, const char 
> *name,
> +   int namelen, loff_t offset,

Re: [PATCH v4 04/14] digest_cache: Add hash tables and operations

2024-04-15 Thread Jarkko Sakkinen

On Mon Apr 15, 2024 at 5:24 PM EEST, Roberto Sassu wrote:
> From: Roberto Sassu 
>
> Add a linked list of hash tables to the digest cache, one per algorithm,
> containing the digests extracted from digest lists.
>
> The number of hash table slots is determined by dividing the number of
> digests to add to the average depth of the collision list defined with
> CONFIG_DIGEST_CACHE_HTABLE_DEPTH (currently set to 30). It can be changed
> in the kernel configuration.
>
> Add digest_cache_htable_init() and digest_cache_htable_add(), to be called
> by digest list parsers, in order to allocate the hash tables and to add
> extracted digests.
>
> Add digest_cache_htable_free(), to let the digest_cache LSM free the hash
> tables at the time a digest cache is freed.
>
> Add digest_cache_htable_lookup() to search a digest in the hash table of a
> digest cache for a given algorithm.
>
> Add digest_cache_lookup() to the public API, to let users of the
> digest_cache LSM search a digest in a digest cache and, in a subsequent
> patch, to search it in the digest caches for each directory entry.
>
> Return the digest cache containing the digest, as a different type,
> digest_cache_found_t to avoid it being accidentally put. Also, introduce
> digest_cache_from_found_t() to explicitly convert it back to a digest cache
> for further use (e.g. retrieving verification data, introduced later).
>
> Finally, add digest_cache_hash_key() to compute the hash table key from the
> first two bytes of the digest (modulo the number of slots).
>
> Signed-off-by: Roberto Sassu 
> ---
>  include/linux/digest_cache.h |  34 +
>  security/digest_cache/Kconfig|  11 ++
>  security/digest_cache/Makefile   |   2 +-
>  security/digest_cache/htable.c   | 250 +++
>  security/digest_cache/internal.h |  43 ++
>  security/digest_cache/main.c |   3 +
>  6 files changed, 342 insertions(+), 1 deletion(-)
>  create mode 100644 security/digest_cache/htable.c
>
> diff --git a/include/linux/digest_cache.h b/include/linux/digest_cache.h
> index e79f94a60b0f..4872700ac205 100644
> --- a/include/linux/digest_cache.h
> +++ b/include/linux/digest_cache.h
> @@ -11,12 +11,39 @@
>  #define _LINUX_DIGEST_CACHE_H
>  
>  #include 
> +#include 
>  
>  struct digest_cache;
>  
> +/**
> + * typedef digest_cache_found_t - Digest cache reference as numeric value
> + *
> + * This new type represents a digest cache reference that should not be put.
> + */
> +typedef unsigned long digest_cache_found_t;
> +
> +/**
> + * digest_cache_from_found_t - Convert digest_cache_found_t to digest cache 
> ptr
> + * @found: digest_cache_found_t value
> + *
> + * Convert the digest_cache_found_t returned by digest_cache_lookup() to a
> + * digest cache pointer, so that it can be passed to the other functions of 
> the
> + * API.
> + *
> + * Return: Digest cache pointer.
> + */
> +static inline struct digest_cache *
> +digest_cache_from_found_t(digest_cache_found_t found)
> +{
> + return (struct digest_cache *)found;
> +}
> +
>  #ifdef CONFIG_SECURITY_DIGEST_CACHE
>  struct digest_cache *digest_cache_get(struct dentry *dentry);
>  void digest_cache_put(struct digest_cache *digest_cache);
> +digest_cache_found_t digest_cache_lookup(struct dentry *dentry,
> +  struct digest_cache *digest_cache,
> +  u8 *digest, enum hash_algo algo);
>  
>  #else
>  static inline struct digest_cache *digest_cache_get(struct dentry *dentry)
> @@ -28,5 +55,12 @@ static inline void digest_cache_put(struct digest_cache 
> *digest_cache)
>  {
>  }
>  
> +static inline digest_cache_found_t
> +digest_cache_lookup(struct dentry *dentry, struct digest_cache *digest_cache,
> + u8 *digest, enum hash_algo algo)
> +{
> + return 0UL;
> +}
> +
>  #endif /* CONFIG_SECURITY_DIGEST_CACHE */
>  #endif /* _LINUX_DIGEST_CACHE_H */
> diff --git a/security/digest_cache/Kconfig b/security/digest_cache/Kconfig
> index dfabe5d6e3ca..71017954e5c5 100644
> --- a/security/digest_cache/Kconfig
> +++ b/security/digest_cache/Kconfig
> @@ -18,3 +18,14 @@ config DIGEST_LIST_DEFAULT_PATH
> It can be changed at run-time, by writing the new path to the
> securityfs interface. Digest caches created with the old path are
> not affected by the change.
> +
> +config DIGEST_CACHE_HTABLE_DEPTH
> + int
> + default 30
> + help
> +   Desired average depth of the collision list in the digest cache
> +   hash tables.
> +
> +   A smaller number will increase the amount of hash table slots, and
> +   make the search faster. A bigger number will decrease the number of
> +   hash table slots, but make the search slower.
> diff --git a/security/digest_cache/Makefile b/security/digest_cache/Makefile
> index 1330655e33b1..7e00c53d8f55 100644
> --- a/security/digest_cache/Makefile
> +++ b/security/digest_cache/Makefile
> @@ -4,4 +4,4 @@
>  
>

Re: [PATCH v4 03/14] digest_cache: Add securityfs interface

2024-04-15 Thread Jarkko Sakkinen

On Mon Apr 15, 2024 at 5:24 PM EEST, Roberto Sassu wrote:
> From: Roberto Sassu 
>
> Add the digest_cache_path file in securityfs, to let root change/read the
> default path (file or directory) from where digest lists are looked up.
>
> An RW semaphore prevents the default path from changing while
> digest_list_new() and read_default_path() are executed, so that those read
> a stable value. Multiple digest_list_new() and read_default_path() calls,
> instead, can be done in parallel, since they are the readers.
>
> Changing the default path does not affect digest caches created with the
> old path.
>
> Signed-off-by: Roberto Sassu 
> ---
>  security/digest_cache/Kconfig|  4 ++
>  security/digest_cache/Makefile   |  2 +-
>  security/digest_cache/internal.h |  1 +
>  security/digest_cache/main.c | 10 +++-
>  security/digest_cache/secfs.c| 87 
>  5 files changed, 102 insertions(+), 2 deletions(-)
>  create mode 100644 security/digest_cache/secfs.c
>
> diff --git a/security/digest_cache/Kconfig b/security/digest_cache/Kconfig
> index e53fbf0779d6..dfabe5d6e3ca 100644
> --- a/security/digest_cache/Kconfig
> +++ b/security/digest_cache/Kconfig
> @@ -14,3 +14,7 @@ config DIGEST_LIST_DEFAULT_PATH
>   default "/etc/digest_lists"
>   help
> Default directory where digest_cache LSM expects to find digest lists.
> +
> +   It can be changed at run-time, by writing the new path to the
> +   securityfs interface. Digest caches created with the old path are
> +   not affected by the change.
> diff --git a/security/digest_cache/Makefile b/security/digest_cache/Makefile
> index 48848c41253e..1330655e33b1 100644
> --- a/security/digest_cache/Makefile
> +++ b/security/digest_cache/Makefile
> @@ -4,4 +4,4 @@
>  
>  obj-$(CONFIG_SECURITY_DIGEST_CACHE) += digest_cache.o
>  
> -digest_cache-y := main.o
> +digest_cache-y := main.o secfs.o
> diff --git a/security/digest_cache/internal.h 
> b/security/digest_cache/internal.h
> index 5f04844af3a5..bbf5eefe5c82 100644
> --- a/security/digest_cache/internal.h
> +++ b/security/digest_cache/internal.h
> @@ -49,6 +49,7 @@ struct digest_cache_security {
>  
>  extern struct lsm_blob_sizes digest_cache_blob_sizes;
>  extern char *default_path_str;
> +extern struct rw_semaphore default_path_sem;
>  
>  static inline struct digest_cache_security *
>  digest_cache_get_security(const struct inode *inode)
> diff --git a/security/digest_cache/main.c b/security/digest_cache/main.c
> index 14dba8915e99..661c8d106791 100644
> --- a/security/digest_cache/main.c
> +++ b/security/digest_cache/main.c
> @@ -18,6 +18,9 @@ static struct kmem_cache *digest_cache_cache __read_mostly;
>  
>  char *default_path_str = CONFIG_DIGEST_LIST_DEFAULT_PATH;
>  
> +/* Protects default_path_str. */
> +struct rw_semaphore default_path_sem;
> +
>  /**
>   * digest_cache_alloc_init - Allocate and initialize a new digest cache
>   * @path_str: Path string of the digest list
> @@ -274,9 +277,12 @@ struct digest_cache *digest_cache_get(struct dentry 
> *dentry)
>  
>   /* Serialize accesses to inode for which the digest cache is used. */
>   mutex_lock(_sec->dig_user_mutex);
> - if (!dig_sec->dig_user)
> + if (!dig_sec->dig_user) {
> + down_read(_path_sem);
>   /* Consume extra reference from digest_cache_create(). */
>   dig_sec->dig_user = digest_cache_new(dentry);
> + up_read(_path_sem);
> + }
>  
>   if (dig_sec->dig_user)
>   /* Increment ref. count for reference returned to the caller. */
> @@ -386,6 +392,8 @@ static const struct lsm_id digest_cache_lsmid = {
>   */
>  static int __init digest_cache_init(void)
>  {
> + init_rwsem(_path_sem);
> +
>   digest_cache_cache = kmem_cache_create("digest_cache_cache",
>  sizeof(struct digest_cache),
>  0, SLAB_PANIC,
> diff --git a/security/digest_cache/secfs.c b/security/digest_cache/secfs.c
> new file mode 100644
> index ..d3a37bf3588e
> --- /dev/null
> +++ b/security/digest_cache/secfs.c
> @@ -0,0 +1,87 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2023-2024 Huawei Technologies Duesseldorf GmbH
> + *
> + * Author: Roberto Sassu 
> + *
> + * Implement the securityfs interface of the digest_cache LSM.
> + */
> +
> +#define pr_fmt(fmt) "DIGEST CACHE: "fmt
> +#include 
> +
> +#include "internal.h"
> +
> +static struct dentry *default_path_dentry;
> +
> +/**
> + * write_default_path - Write default path
> + * @file: File descriptor of the securityfs file
> + * @buf: User space buffer
> + * @datalen: Amount of data to write
> + * @ppos: Current position in the file
> + *
> + * This function sets the new default path where digest lists can be found.
> + * Can be either a regular file or a directory.
> + *
> + * Return: Length of path written on success, a POSIX error code otherwise.
> + */
> +static

Re: [PATCH v4 02/14] security: Introduce the digest_cache LSM

2024-04-15 Thread Jarkko Sakkinen

On Mon Apr 15, 2024 at 5:24 PM EEST, Roberto Sassu wrote:
> From: Roberto Sassu 
>
> Introduce the digest_cache LSM, to collect digests from various sources
> (called digest lists), and to store them in kernel memory, in a set of hash
> tables forming a digest cache. Extracted digests can be used as reference
> values for integrity verification of file data or metadata.
>
> A digest cache has three types of references: in the inode security blob of
> the digest list the digest cache was created from (dig_owner field); in the
> security blob of the inodes for which the digest cache is requested
> (dig_user field); a reference returned by digest_cache_get().
>
> References are released with digest_cache_put(), in the first two cases
> when inodes are evicted from memory, in the last case when that function is
> explicitly called. Obtaining a digest cache reference means that the digest
> cache remains valid and cannot be freed until releasing it and until the
> total number of references (stored in the digest cache) becomes zero.
>
> When digest_cache_get() is called on an inode to compare its digest with
> a reference value, the digest_cache LSM knows which digest cache to get
> from the new security.digest_list xattr added to that inode, which contains
> the file name of the desired digest list digests will be extracted from.
>
> All digest lists are expected to be in the same directory, defined in the
> kernel config, and modifiable (with a later patch) at run-time through
> securityfs. When the digest_cache LSM reads the security.digest_list xattr,
> it uses its value as last path component, appended to the default path
> (unless the default path is a file). If an inode does not have that xattr,
> the default path is considered as the final destination.
>
> The default path can be either a file or a directory. If it is a file, the
> digest_cache LSM always uses the same digest cache from that file to verify
> all inodes (the xattr, if present, is ignored). If it is a directory, and
> the inode to verify does not have the xattr, a subsequent patch will make
> it possible to iterate and lookup on the digest caches created from each
> directory entry.
>
> Digest caches are created on demand, only when digest_cache_get() is
> called. The first time a digest cache is requested, the digest_cache LSM
> creates it and sets its reference in the dig_owner and dig_user fields of
> the respective inode security blobs. On the next requests, the previously
> set reference is returned, after incrementing the reference count.
>
> Since there might be multiple digest_cache_get() calls for the same inode,
> or for different inodes pointing to the same digest list, dig_owner_mutex
> and dig_user_mutex have been introduced to protect the check and assignment
> of the digest cache reference in the inode security blob.
>
> Contenders that didn't get the lock also have to wait until the digest
> cache is fully instantiated (when the bit INIT_IN_PROGRESS is cleared).
> Dig_owner_mutex cannot be used for waiting on the instantiation to avoid
> lock inversion with the inode lock for directories.
>
> Signed-off-by: Roberto Sassu 
> ---
>  MAINTAINERS   |   6 +
>  include/linux/digest_cache.h  |  32 ++
>  include/uapi/linux/lsm.h  |   1 +
>  include/uapi/linux/xattr.h|   3 +
>  security/Kconfig  |  11 +-
>  security/Makefile |   1 +
>  security/digest_cache/Kconfig |  16 +
>  security/digest_cache/Makefile|   7 +
>  security/digest_cache/internal.h  |  86 
>  security/digest_cache/main.c  | 404 ++
>  security/security.c   |   3 +-
>  .../selftests/lsm/lsm_list_modules_test.c |   3 +
>  12 files changed, 567 insertions(+), 6 deletions(-)
>  create mode 100644 include/linux/digest_cache.h
>  create mode 100644 security/digest_cache/Kconfig
>  create mode 100644 security/digest_cache/Makefile
>  create mode 100644 security/digest_cache/internal.h
>  create mode 100644 security/digest_cache/main.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b1ca23ab8732..72801a88449c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6193,6 +6193,12 @@ L: linux-g...@vger.kernel.org
>  S:   Maintained
>  F:   drivers/gpio/gpio-gpio-mm.c
>  
> +DIGEST_CACHE LSM
> +M:   Roberto Sassu 
> +L:   linux-security-mod...@vger.kernel.org
> +S:   Maintained
> +F:   security/digest_cache/
> +
>  DIGITEQ AUTOMOTIVE MGB4 V4L2 DRIVER
>  M:   Martin Tuma 
>  L:   linux-me...@vger.kernel.org

Nit: afaik, MAINTAINER updates should be split.

> diff --git a/include/linux/digest_cache.h b/include/linux/digest_cache.h
> new file mode 100644
> index ..e79f94a60b0f
> --- /dev/null
> +++ b/include/linux/digest_cache.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C)

Re: [GIT PULL] Kselftest fixes update for Linux 6.9-rc5

2024-04-15 Thread pr-tracker-bot

The pull request you sent on Mon, 15 Apr 2024 10:23:12 -0600:

> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest 
> tags/linux_kselftest-fixes-6.9-rc5

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3fdfcd98f002ade3f92038f7c164d45b2e8b7a79

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

Re: [PATCH v4 01/14] lib: Add TLV parser

2024-04-15 Thread Jarkko Sakkinen

On Mon Apr 15, 2024 at 5:24 PM EEST, Roberto Sassu wrote:
> From: Roberto Sassu 
>
> Add a parser of a generic TLV format:

What is TLV?

BR, Jarkko

Re: [PATCH v4 00/14] security: digest_cache LSM

2024-04-15 Thread Jarkko Sakkinen

On Mon Apr 15, 2024 at 5:24 PM EEST, Roberto Sassu wrote:
> From: Roberto Sassu 
>
> Integrity detection and protection has long been a desirable feature, to
> reach a large user base and mitigate the risk of flaws in the software
> and attacks.
>
> However, while solutions exist, they struggle to reach the large user
> base, due to requiring higher than desired constraints on performance,
> flexibility and configurability, that only security conscious people are
> willing to accept.
>
> This is where the new digest_cache LSM comes into play, it offers
> additional support for new and existing integrity solutions, to make
> them faster and easier to deploy.

Sorry for nitpicking but what are the existing integrity solutions, 
and how does it help with this struggle? I.e. what is the gist here?

BR, Jarkko

Re: [PATCH v10 1/5] mseal: Wire up mseal syscall

2024-04-15 Thread Jeff Xu

On Mon, Apr 15, 2024 at 11:21 AM Linus Torvalds
 wrote:
>
> On Mon, 15 Apr 2024 at 11:11, Muhammad Usama Anjum
>  wrote:
> >
> > It isn't logical to wire up something which isn't present
>
> Actually, with system calls, the rules end up being almost opposite.
>
> There's no point in adding the code if it's not reachable. So adding
> the system call code before adding the wiring makes no sense.
>
> So you have two cases: add the stubs first, or add the code first.
> Neither does anything without the other.
>
> So then you go "add both in the same commit" option, which ends up
> being horrible from a "review the code" standpoint. The two parts are
> entirely different and mixing them up makes the patch very unclear
> (and has very different target audiences for reviewing it - the MM
> people really shouldn't have to look at the architecture wiring
> parts).
>
> End result: there are no "this is the logical ordering" cases.
>
> But the "wire up system calls" part actually has some reasons to be first:
>
>  - it reserves the system call number
>
>  - it adds the "when system call isn't enabled, return -ENOSYS"
> conditional system call logic
>
> so I actually tend prefer this ordering when it comes to system calls.
>
I confirm that the wire up change can be merged by its own, i.e. build
will pass, and  -ENOSYS will be returned at runtime.

Thanks Linus for clarifying this.
-Jeff


> Linus

Re: [PATCH] KVM: selftests: Avoid assuming "sudo" exists

2024-04-15 Thread Muhammad Usama Anjum

On 4/15/24 7:43 PM, Brendan Jackman wrote:
> I ran into a failure running this test on a minimal rootfs.
I've ran into similar issue before for another test. Its clever solution.

> 
> Can be fixed by just skipping the "sudo" in case we are already root.
> 
> Signed-off-by: Brendan Jackman 
Reviewed-by: Muhammad Usama Anjum 

> ---
>  tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh 
> b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
> index 7cbb409801eea..0e56822e8e0bf 100755
> --- a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
> +++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
> @@ -13,10 +13,21 @@ NX_HUGE_PAGES_RECOVERY_RATIO=$(cat 
> /sys/module/kvm/parameters/nx_huge_pages_reco
>  NX_HUGE_PAGES_RECOVERY_PERIOD=$(cat 
> /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms)
>  HUGE_PAGES=$(cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages)
>  
> +# If we're already root, the host might not have sudo.
> +if [ $(whoami) == "root" ]; then
> + function maybe_sudo () {
> + "$@"
> + }
> +else
> + function maybe_sudo () {
> + sudo "$@"
> + }
> +fi
> +
>  set +e
>  
>  function sudo_echo () {
> - echo "$1" | sudo tee -a "$2" > /dev/null
> + echo "$1" | maybe_sudo tee -a "$2" > /dev/null
>  }
>  
>  NXECUTABLE="$(dirname $0)/nx_huge_pages_test"
> 
> ---
> base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
> change-id: 20240415-kvm-selftests-no-sudo-1a55f831f882
> 
> Best regards,

-- 
BR,
Muhammad Usama Anjum

Re: [PATCH v1] KVM: s390x: selftests: Add shared zeropage test

2024-04-15 Thread Muhammad Usama Anjum

On 4/12/24 1:43 PM, David Hildenbrand wrote:
> Let's test that we can have shared zeropages in our process as long as
> storage keys are not getting used, that shared zeropages are properly
> unshared (replaced by anonymous pages) once storage keys are enabled,
> and that no new shared zeropages are populated after storage keys
> were enabled.
> 
> We require the new pagemap interface to detect the shared zeropage.
> 
> On an old kernel (zeropages always disabled):
>   # ./s390x/shared_zeropage_test
>   TAP version 13
>   1..3
>   not ok 1 Shared zeropages should be enabled
>   ok 2 Shared zeropage should be gone
>   ok 3 Shared zeropages should be disabled
>   # Totals: pass:2 fail:1 xfail:0 xpass:0 skip:0 error:0
> 
> On a fixed kernel:
>   # ./s390x/shared_zeropage_test
>   TAP version 13
>   1..3
>   ok 1 Shared zeropages should be enabled
>   ok 2 Shared zeropage should be gone
>   ok 3 Shared zeropages should be disabled
>   # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> Testing of UFFDIO_ZEROPAGE can be added later.
> 
> Cc: Christian Borntraeger 
> Cc: Janosch Frank 
> Cc: Claudio Imbrenda 
> Cc: Thomas Huth 
> Cc: Alexander Gordeev 
> Cc: Paolo Bonzini 
> Cc: Shuah Khan 
> Signed-off-by: David Hildenbrand 
Acked-by: Muhammad Usama Anjum 

> ---
> 
> To get it right this time, test the relevant cases.
> 
> v3 of fixes are at:
>  https://lore.kernel.org/all/20240411161441.910170-1-da...@redhat.com/T/#u
> 
> ---
>  tools/testing/selftests/kvm/Makefile  |   1 +
>  .../kvm/s390x/shared_zeropage_test.c  | 110 ++
>  2 files changed, 111 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/s390x/shared_zeropage_test.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index 741c7dc16afc..ed4ad591f193 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -180,6 +180,7 @@ TEST_GEN_PROGS_s390x += s390x/sync_regs_test
>  TEST_GEN_PROGS_s390x += s390x/tprot
>  TEST_GEN_PROGS_s390x += s390x/cmma_test
>  TEST_GEN_PROGS_s390x += s390x/debug_test
> +TEST_GEN_PROGS_s390x += s390x/shared_zeropage_test
>  TEST_GEN_PROGS_s390x += demand_paging_test
>  TEST_GEN_PROGS_s390x += dirty_log_test
>  TEST_GEN_PROGS_s390x += guest_print_test
> diff --git a/tools/testing/selftests/kvm/s390x/shared_zeropage_test.c 
> b/tools/testing/selftests/kvm/s390x/shared_zeropage_test.c
> new file mode 100644
> index ..74e829748fb1
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/s390x/shared_zeropage_test.c
> @@ -0,0 +1,110 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Test shared zeropage handling (with/without storage keys)
> + *
> + * Copyright (C) 2024, Red Hat, Inc.
> + */
> +#include 
> +
> +#include 
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "kselftest.h"
> +
> +static void set_storage_key(void *addr, uint8_t skey)
> +{
> + asm volatile("sske %0,%1" : : "d" (skey), "a" (addr));
> +}
> +
> +static void guest_code(void)
> +{
> + /* Issue some storage key instruction. */
> + set_storage_key((void *)0, 0x98);
> + GUEST_DONE();
> +}
> +
> +/*
> + * Returns 1 if the shared zeropage is mapped, 0 if something else is mapped.
> + * Returns < 0 on error or if nothing is mapped.
> + */
> +static int maps_shared_zeropage(int pagemap_fd, void *addr)
> +{
> + struct page_region region;
> + struct pm_scan_arg arg = {
> + .start = (uintptr_t)addr,
> + .end = (uintptr_t)addr + 4096,
> + .vec = (uintptr_t),
> + .vec_len = 1,
> + .size = sizeof(struct pm_scan_arg),
> + .category_mask = PAGE_IS_PFNZERO,
> + .category_anyof_mask = PAGE_IS_PRESENT,
> + .return_mask = PAGE_IS_PFNZERO,
> + };
> + return ioctl(pagemap_fd, PAGEMAP_SCAN, );Its good to see more users 
> for it.

> +}
> +
> +int main(int argc, char *argv[])
> +{
> + char *mem, *page0, *page1, *page2, tmp;
> + const size_t pagesize = getpagesize();
> + struct kvm_vcpu *vcpu;
> + struct kvm_vm *vm;
> + struct ucall uc;
> + int pagemap_fd;
> +
> + ksft_print_header();
> + ksft_set_plan(3);
> +
> + /*
> +  * We'll use memory that is not mapped into the VM for simplicity.
> +  * Shared zeropages are enabled/disabled per-process.
> +  */
> + mem = mmap(0, 3 * pagesize, PROT_READ, MAP_PRIVATE|MAP_ANON, -1, 0);
> + TEST_ASSERT(mem != MAP_FAILED, "mmap() failed");
> +
> + /* Disable THP. Ignore errors on older kernels. */
> + madvise(mem, 3 * pagesize, MADV_NOHUGEPAGE);
> +
> + page0 = mem;
> + page1 = page0 + pagesize;
> + page2 = page1 + pagesize;
> +
> + /* Can we even detect shared zeropages? */
> + pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
> + TEST_REQUIRE(pagemap_fd >= 0);
> +
> + tmp = *page0;
> + asm

Re: [PATCH net-next v2 0/6] selftests: net: exercise page pool reporting via netlink

2024-04-15 Thread patchwork-bot+netdevbpf

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski :

On Fri, 12 Apr 2024 07:14:30 -0700 you wrote:
> Add a basic test for page pool netlink reporting.
> 
> v2:
>  - pass args as *args (patch 3)
>  - improve the test and add busy wait helper (patch 6)
> v1: https://lore.kernel.org/all/20240411012815.174400-1-k...@kernel.org/
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/6] net: netdevsim: add some fake page pool use
https://git.kernel.org/netdev/net-next/c/1580cbcbfe77
  - [net-next,v2,2/6] tools: ynl: don't return None for dumps
https://git.kernel.org/netdev/net-next/c/72ba6cba0a6e
  - [net-next,v2,3/6] selftests: net: print report check location in python 
tests
https://git.kernel.org/netdev/net-next/c/eeb409bde964
  - [net-next,v2,4/6] selftests: net: print full exception on failure
https://git.kernel.org/netdev/net-next/c/99583b970b90
  - [net-next,v2,5/6] selftests: net: support use of NetdevSimDev under "with" 
in python
https://git.kernel.org/netdev/net-next/c/8554d6e39b6a
  - [net-next,v2,6/6] selftests: net: exercise page pool reporting via netlink
https://git.kernel.org/netdev/net-next/c/05fa5c31b988

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

Re: [PATCH 2/2] selftests: power_supply: Make it POSIX-compliant

2024-04-15 Thread Muhammad Usama Anjum

On 4/15/24 8:32 PM, Nícolas F. R. A. Prado wrote:
> There is one use of bash specific syntax in the script. Change it to the
> equivalent POSIX syntax. This doesn't change functionality and allows
> the test to be run on shells other than bash.
> 
> Reported-by: Mike Looijmans 
> Closes: 
> https://lore.kernel.org/all/efae4037-c22a-40be-8ba9-7c1c12ece...@topic.nl/
> Fixes: 4a679c5afca0 ("selftests: Add test to verify power supply properties")
> Signed-off-by: Nícolas F. R. A. Prado 
Reviewed-by: Muhammad Usama Anjum 

> ---
>  tools/testing/selftests/power_supply/test_power_supply_properties.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git 
> a/tools/testing/selftests/power_supply/test_power_supply_properties.sh 
> b/tools/testing/selftests/power_supply/test_power_supply_properties.sh
> index df272dfe1d2a..a66b1313ed88 100755
> --- a/tools/testing/selftests/power_supply/test_power_supply_properties.sh
> +++ b/tools/testing/selftests/power_supply/test_power_supply_properties.sh
> @@ -23,7 +23,7 @@ count_tests() {
>   total_tests=0
>  
>   for i in $SUPPLIES; do
> - total_tests=$(("$total_tests" + "$NUM_TESTS"))
> + total_tests=$((total_tests + NUM_TESTS))
>   done
>  
>   echo "$total_tests"
> 

-- 
BR,
Muhammad Usama Anjum

Re: [PATCH 1/2] selftests: ktap_helpers: Make it POSIX-compliant

2024-04-15 Thread Muhammad Usama Anjum

On 4/15/24 8:32 PM, Nícolas F. R. A. Prado wrote:
> There are a couple uses of bash specific syntax in the script. Change
> them to the equivalent POSIX syntax. This doesn't change functionality
> and allows non-bash test scripts to make use of these helpers.
> 
> Reported-by: Mike Looijmans 
> Closes: 
> https://lore.kernel.org/all/efae4037-c22a-40be-8ba9-7c1c12ece...@topic.nl/
> Fixes: 2dd0b5a8fcc4 ("selftests: ktap_helpers: Add a helper to finish the 
> test")
> Fixes: 14571ab1ad21 ("kselftest: Add new test for detecting unprobed 
> Devicetree devices")
> Signed-off-by: Nícolas F. R. A. Prado 
Reviewed-by: Muhammad Usama Anjum 

> ---
>  tools/testing/selftests/kselftest/ktap_helpers.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/kselftest/ktap_helpers.sh 
> b/tools/testing/selftests/kselftest/ktap_helpers.sh
> index f2fbb914e058..79a125eb24c2 100644
> --- a/tools/testing/selftests/kselftest/ktap_helpers.sh
> +++ b/tools/testing/selftests/kselftest/ktap_helpers.sh
> @@ -43,7 +43,7 @@ __ktap_test() {
>   directive="$3" # optional
>  
>   local directive_str=
> - [[ ! -z "$directive" ]] && directive_str="# $directive"
> + [ ! -z "$directive" ] && directive_str="# $directive"
>  
>   echo $result $KTAP_TESTNO $description $directive_str
>  
> @@ -99,7 +99,7 @@ ktap_exit_fail_msg() {
>  ktap_finished() {
>   ktap_print_totals
>  
> - if [ $(("$KTAP_CNT_PASS" + "$KTAP_CNT_SKIP")) -eq "$KSFT_NUM_TESTS" ]; 
> then
> + if [ $((KTAP_CNT_PASS + KTAP_CNT_SKIP)) -eq "$KSFT_NUM_TESTS" ]; then
>   exit "$KSFT_PASS"
>   else
>   exit "$KSFT_FAIL"
> 

-- 
BR,
Muhammad Usama Anjum

Re: [PATCH v10 3/5] selftest mm/mseal memory sealing

2024-04-15 Thread Muhammad Usama Anjum

Please fix following for this and fifth patch as well:

--> checkpatch.pl --codespell tools/testing/selftests/mm/mseal_test.c

WARNING: Macros with flow control statements should be avoided
#42: FILE: tools/testing/selftests/mm/mseal_test.c:42:
+#define FAIL_TEST_IF_FALSE(c) do {\
+   if (!(c)) {\
+   ksft_test_result_fail("%s, line:%d\n", __func__,
__LINE__);\
+   goto test_end;\
+   } \
+   } \
+   while (0)

WARNING: Macros with flow control statements should be avoided
#50: FILE: tools/testing/selftests/mm/mseal_test.c:50:
+#define SKIP_TEST_IF_FALSE(c) do {\
+   if (!(c)) {\
+   ksft_test_result_skip("%s, line:%d\n", __func__,
__LINE__);\
+   goto test_end;\
+   } \
+   } \
+   while (0)

WARNING: Macros with flow control statements should be avoided
#59: FILE: tools/testing/selftests/mm/mseal_test.c:59:
+#define TEST_END_CHECK() {\
+   ksft_test_result_pass("%s\n", __func__);\
+   return;\
+test_end:\
+   return;\
+}


On 4/15/24 9:35 PM, jef...@chromium.org wrote:
> From: Jeff Xu 
> 
> selftest for memory sealing change in mmap() and mseal().
> 
> Signed-off-by: Jeff Xu 
> ---
>  tools/testing/selftests/mm/.gitignore   |1 +
>  tools/testing/selftests/mm/Makefile |1 +
>  tools/testing/selftests/mm/mseal_test.c | 1836 +++
>  3 files changed, 1838 insertions(+)
>  create mode 100644 tools/testing/selftests/mm/mseal_test.c
> 
> diff --git a/tools/testing/selftests/mm/.gitignore 
> b/tools/testing/selftests/mm/.gitignore
> index d26e962f2ac4..98eaa4590f11 100644
> --- a/tools/testing/selftests/mm/.gitignore
> +++ b/tools/testing/selftests/mm/.gitignore
> @@ -47,3 +47,4 @@ mkdirty
>  va_high_addr_switch
>  hugetlb_fault_after_madv
>  hugetlb_madv_vs_map
> +mseal_test
> diff --git a/tools/testing/selftests/mm/Makefile 
> b/tools/testing/selftests/mm/Makefile
> index eb5f39a2668b..95d10fe1b3c1 100644
> --- a/tools/testing/selftests/mm/Makefile
> +++ b/tools/testing/selftests/mm/Makefile
> @@ -59,6 +59,7 @@ TEST_GEN_FILES += mlock2-tests
>  TEST_GEN_FILES += mrelease_test
>  TEST_GEN_FILES += mremap_dontunmap
>  TEST_GEN_FILES += mremap_test
> +TEST_GEN_FILES += mseal_test
>  TEST_GEN_FILES += on-fault-limit
>  TEST_GEN_FILES += pagemap_ioctl
>  TEST_GEN_FILES += thuge-gen
> diff --git a/tools/testing/selftests/mm/mseal_test.c 
> b/tools/testing/selftests/mm/mseal_test.c
> new file mode 100644
> index ..06c780d1d8e5
> --- /dev/null
> +++ b/tools/testing/selftests/mm/mseal_test.
> +static void __write_pkey_reg(u64 pkey_reg)
> +{
> +#if defined(__i386__) || defined(__x86_64__) /* arch */
> + unsigned int eax = pkey_reg;
> + unsigned int ecx = 0;
> + unsigned int edx = 0;
> +
> + asm volatile(".byte 0x0f,0x01,0xef\n\t"
> + : : "a" (eax), "c" (ecx), "d" (edx));
> + assert(pkey_reg == __read_pkey_reg());
Use ksft_exit_fail_msg instead of assert to stay inside TAP format if
condition is false and error is generated.

> +int main(int argc, char **argv)
> +{
> + bool test_seal = seal_support();
> +
> + ksft_print_header();
> +
> + if (!test_seal)
> + ksft_exit_skip("sealing not supported, check CONFIG_64BIT\n");
> +
> + if (!pkey_supported())
> + ksft_print_msg("PKEY not supported\n");
> +
> + ksft_set_plan(80);
> +
> + test_seal_addseal();
> + test_seal_unmapped_start();
> + test_seal_unmapped_middle();
> + test_seal_unmapped_end();
> + test_seal_multiple_vmas();
> + test_seal_split_start();
> + test_seal_split_end();
> + test_seal_invalid_input();
> + test_seal_zero_length();
> + test_seal_twice();
> +
> + test_seal_mprotect(false);
> + test_seal_mprotect(true);
> +
> + test_seal_start_mprotect(false);
> + test_seal_start_mprotect(true);
> +
> + test_seal_end_mprotect(false);
> + test_seal_end_mprotect(true);
> +
> + test_seal_mprotect_unalign_len(false);
> + test_seal_mprotect_unalign_len(true);
> +
> + test_seal_mprotect_unalign_len_variant_2(false);
> + test_seal_mprotect_unalign_len_variant_2(true);
> +
> + test_seal_mprotect_two_vma(false);
> + test_seal_mprotect_two_vma(true);
> +
> + test_seal_mprotect_two_vma_with_split(false);
> + test_seal_mprotect_two_vma_with_split(true);
> +
> + test_seal_mprotect_partial_mprotect(false);
> + test_seal_mprotect_partial_mprotect(true);
> +
> + test_seal_mprotect_two_vma_with_gap(false);
> + test_seal_mprotect_two_vma_with_gap(true);
> +
> + test_seal_mprotect_merge(false);
> + test_seal_mprotect_merge(true);
> +
> + test_seal_mprotect_split(false);
> + test_seal_mprotect_split(true);
> +
> + test_seal_munmap(false);
> + test_seal_munmap(true);
> + test_seal_munmap_two_vma(false);
> + test_seal_munmap_two_vma(true);
> +

Re: [PATCH v10 1/5] mseal: Wire up mseal syscall

2024-04-15 Thread Linus Torvalds

On Mon, 15 Apr 2024 at 11:11, Muhammad Usama Anjum
 wrote:
>
> It isn't logical to wire up something which isn't present

Actually, with system calls, the rules end up being almost opposite.

There's no point in adding the code if it's not reachable. So adding
the system call code before adding the wiring makes no sense.

So you have two cases: add the stubs first, or add the code first.
Neither does anything without the other.

So then you go "add both in the same commit" option, which ends up
being horrible from a "review the code" standpoint. The two parts are
entirely different and mixing them up makes the patch very unclear
(and has very different target audiences for reviewing it - the MM
people really shouldn't have to look at the architecture wiring
parts).

End result: there are no "this is the logical ordering" cases.

But the "wire up system calls" part actually has some reasons to be first:

 - it reserves the system call number

 - it adds the "when system call isn't enabled, return -ENOSYS"
conditional system call logic

so I actually tend prefer this ordering when it comes to system calls.

Linus

Re: [PATCH v10 1/5] mseal: Wire up mseal syscall

2024-04-15 Thread Muhammad Usama Anjum

On 4/15/24 9:35 PM, jef...@chromium.org wrote:
> From: Jeff Xu 
> 
> Wire up mseal syscall for all architectures.
It isn't logical to wire up something which isn't present. Please first add
the mseal() and then wire up. Please swap first and second patches. I've
seen this same comment before.

> 
> Signed-off-by: Jeff Xu 
> ---
>  arch/alpha/kernel/syscalls/syscall.tbl  | 1 +
>  arch/arm/tools/syscall.tbl  | 1 +
>  arch/arm64/include/asm/unistd.h | 2 +-
>  arch/arm64/include/asm/unistd32.h   | 2 ++
>  arch/m68k/kernel/syscalls/syscall.tbl   | 1 +
>  arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
>  arch/mips/kernel/syscalls/syscall_n32.tbl   | 1 +
>  arch/mips/kernel/syscalls/syscall_n64.tbl   | 1 +
>  arch/mips/kernel/syscalls/syscall_o32.tbl   | 1 +
>  arch/parisc/kernel/syscalls/syscall.tbl | 1 +
>  arch/powerpc/kernel/syscalls/syscall.tbl| 1 +
>  arch/s390/kernel/syscalls/syscall.tbl   | 1 +
>  arch/sh/kernel/syscalls/syscall.tbl | 1 +
>  arch/sparc/kernel/syscalls/syscall.tbl  | 1 +
>  arch/x86/entry/syscalls/syscall_32.tbl  | 1 +
>  arch/x86/entry/syscalls/syscall_64.tbl  | 1 +
>  arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
>  include/uapi/asm-generic/unistd.h   | 5 -
>  kernel/sys_ni.c | 1 +
>  19 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
> b/arch/alpha/kernel/syscalls/syscall.tbl
> index 8ff110826ce2..d8f96362e9f8 100644
> --- a/arch/alpha/kernel/syscalls/syscall.tbl
> +++ b/arch/alpha/kernel/syscalls/syscall.tbl
> @@ -501,3 +501,4 @@
>  569  common  lsm_get_self_attr   sys_lsm_get_self_attr
>  570  common  lsm_set_self_attr   sys_lsm_set_self_attr
>  571  common  lsm_list_modulessys_lsm_list_modules
> +572  common  mseal   sys_mseal
> diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> index b6c9e01e14f5..2ed7d229c8f9 100644
> --- a/arch/arm/tools/syscall.tbl
> +++ b/arch/arm/tools/syscall.tbl
> @@ -475,3 +475,4 @@
>  459  common  lsm_get_self_attr   sys_lsm_get_self_attr
>  460  common  lsm_set_self_attr   sys_lsm_set_self_attr
>  461  common  lsm_list_modulessys_lsm_list_modules
> +462  common  mseal   sys_mseal
> diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
> index 491b2b9bd553..1346579f802f 100644
> --- a/arch/arm64/include/asm/unistd.h
> +++ b/arch/arm64/include/asm/unistd.h
> @@ -39,7 +39,7 @@
>  #define __ARM_NR_compat_set_tls  (__ARM_NR_COMPAT_BASE + 5)
>  #define __ARM_NR_COMPAT_END  (__ARM_NR_COMPAT_BASE + 0x800)
>  
> -#define __NR_compat_syscalls 462
> +#define __NR_compat_syscalls 463
>  #endif
>  
>  #define __ARCH_WANT_SYS_CLONE
> diff --git a/arch/arm64/include/asm/unistd32.h 
> b/arch/arm64/include/asm/unistd32.h
> index 7118282d1c79..266b96acc014 100644
> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -929,6 +929,8 @@ __SYSCALL(__NR_lsm_get_self_attr, sys_lsm_get_self_attr)
>  __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
>  #define __NR_lsm_list_modules 461
>  __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
> +#define __NR_mseal 462
> +__SYSCALL(__NR_mseal, sys_mseal)
>  
>  /*
>   * Please add new compat syscalls above this comment and update
> diff --git a/arch/m68k/kernel/syscalls/syscall.tbl 
> b/arch/m68k/kernel/syscalls/syscall.tbl
> index 7fd43fd4c9f2..22a3cbd4c602 100644
> --- a/arch/m68k/kernel/syscalls/syscall.tbl
> +++ b/arch/m68k/kernel/syscalls/syscall.tbl
> @@ -461,3 +461,4 @@
>  459  common  lsm_get_self_attr   sys_lsm_get_self_attr
>  460  common  lsm_set_self_attr   sys_lsm_set_self_attr
>  461  common  lsm_list_modulessys_lsm_list_modules
> +462  common  mseal   sys_mseal
> diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl 
> b/arch/microblaze/kernel/syscalls/syscall.tbl
> index b00ab2cabab9..2b81a6bd78b2 100644
> --- a/arch/microblaze/kernel/syscalls/syscall.tbl
> +++ b/arch/microblaze/kernel/syscalls/syscall.tbl
> @@ -467,3 +467,4 @@
>  459  common  lsm_get_self_attr   sys_lsm_get_self_attr
>  460  common  lsm_set_self_attr   sys_lsm_set_self_attr
>  461  common  lsm_list_modulessys_lsm_list_modules
> +462  common  mseal   sys_mseal
> diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
> b/arch/mips/kernel/syscalls/syscall_n32.tbl
> index 83cfc9eb6b88..cc869f5d5693 100644
> --- a/arch/mips/kernel/syscalls/syscall_n32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
> @@ -400,3 +400,4 @@
>  459  n32 lsm_get_self_attr   sys_lsm_get_self_attr
>  460  n32 lsm_set_self_attr   sys_lsm_set_self_attr
>  461  n32 lsm_list_modules

RE: [PATCH v2 00/25] Enable FRED with KVM VMX

2024-04-15 Thread Li, Xin3

> This patch set enables the Intel flexible return and event delivery
> (FRED) architecture with KVM VMX to allow guests to utilize FRED.
> 



> 
> Intel VMX architecture is extended to run FRED guests, and the major changes
> are:
> 
> 1) New VMCS fields for FRED context management, which includes two new
> event data VMCS fields, eight new guest FRED context VMCS fields and eight new
> host FRED context VMCS fields.
> 
> 2) VMX nested-exception support for proper virtualization of stack levels
> introduced with FRED architecture.
> 



> 
> Patch 1-2 are cleanups to VMX basic and misc MSRs, which were sent out earlier
> as a preparation for FRED changes:
> https://lore.kernel.org/kvm/20240206182032.1596-1-xin3...@intel.com/T/#u

Obviously I will drop the 2 clean patches in the next iteration.

> Patch 3-15 add FRED support to VMX.
> Patch 16-21 add FRED support to nested VMX.
> Patch 22 exposes FRED and its baseline features to KVM guests.
> Patch 23-25 add FRED selftests.

Please help to review and comment on the FRED KVM/VMX patches.

Thanks!
Xin

Re: [PATCH net-next 4/5] selftests: drv-net: construct environment for running tests which require an endpoint

2024-04-15 Thread Jakub Kicinski

On Mon, 15 Apr 2024 11:28:47 -0400 Willem de Bruijn wrote:
> > If I have to (:
> > Endpoint isn't great.
> > But remote doesn't seem much better, and it doesn't have a nice
> > abbreviation :(  
> 
> It pairs well with local.
> 
> Since in some tests the (local) machine under test is the sender and
> in others it is the receiver, we cannot use SERVER/CLIENT or so.

Alright.

> > > Use FC00::/7 ULA addresses?  
> > 
> > Doesn't ULA have some magic address selection rules which IETF 
> > is just trying to fix now? IIUC 0100:: is the documentation prefix,
> > so shouldn't be too bad?  
> 
> RFC  defines this as the "Discard Prefix".

Alright, let me use Paolo's suggestion of 2001:db8:

[PATCH v10 3/5] selftest mm/mseal memory sealing

2024-04-15 Thread jeffxu

From: Jeff Xu 

selftest for memory sealing change in mmap() and mseal().

Signed-off-by: Jeff Xu 
---
 tools/testing/selftests/mm/.gitignore   |1 +
 tools/testing/selftests/mm/Makefile |1 +
 tools/testing/selftests/mm/mseal_test.c | 1836 +++
 3 files changed, 1838 insertions(+)
 create mode 100644 tools/testing/selftests/mm/mseal_test.c

diff --git a/tools/testing/selftests/mm/.gitignore 
b/tools/testing/selftests/mm/.gitignore
index d26e962f2ac4..98eaa4590f11 100644
--- a/tools/testing/selftests/mm/.gitignore
+++ b/tools/testing/selftests/mm/.gitignore
@@ -47,3 +47,4 @@ mkdirty
 va_high_addr_switch
 hugetlb_fault_after_madv
 hugetlb_madv_vs_map
+mseal_test
diff --git a/tools/testing/selftests/mm/Makefile 
b/tools/testing/selftests/mm/Makefile
index eb5f39a2668b..95d10fe1b3c1 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -59,6 +59,7 @@ TEST_GEN_FILES += mlock2-tests
 TEST_GEN_FILES += mrelease_test
 TEST_GEN_FILES += mremap_dontunmap
 TEST_GEN_FILES += mremap_test
+TEST_GEN_FILES += mseal_test
 TEST_GEN_FILES += on-fault-limit
 TEST_GEN_FILES += pagemap_ioctl
 TEST_GEN_FILES += thuge-gen
diff --git a/tools/testing/selftests/mm/mseal_test.c 
b/tools/testing/selftests/mm/mseal_test.c
new file mode 100644
index ..06c780d1d8e5
--- /dev/null
+++ b/tools/testing/selftests/mm/mseal_test.c
@@ -0,0 +1,1836 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "../kselftest.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * need those definition for manually build using gcc.
+ * gcc -I ../../../../usr/include   -DDEBUG -O3  -DDEBUG -O3 mseal_test.c -o 
mseal_test
+ */
+#ifndef PKEY_DISABLE_ACCESS
+# define PKEY_DISABLE_ACCESS0x1
+#endif
+
+#ifndef PKEY_DISABLE_WRITE
+# define PKEY_DISABLE_WRITE 0x2
+#endif
+
+#ifndef PKEY_BITS_PER_KEY
+#define PKEY_BITS_PER_PKEY  2
+#endif
+
+#ifndef PKEY_MASK
+#define PKEY_MASK   (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)
+#endif
+
+#define FAIL_TEST_IF_FALSE(c) do {\
+   if (!(c)) {\
+   ksft_test_result_fail("%s, line:%d\n", __func__, 
__LINE__);\
+   goto test_end;\
+   } \
+   } \
+   while (0)
+
+#define SKIP_TEST_IF_FALSE(c) do {\
+   if (!(c)) {\
+   ksft_test_result_skip("%s, line:%d\n", __func__, 
__LINE__);\
+   goto test_end;\
+   } \
+   } \
+   while (0)
+
+
+#define TEST_END_CHECK() {\
+   ksft_test_result_pass("%s\n", __func__);\
+   return;\
+test_end:\
+   return;\
+}
+
+#ifndef u64
+#define u64 unsigned long long
+#endif
+
+static unsigned long get_vma_size(void *addr, int *prot)
+{
+   FILE *maps;
+   char line[256];
+   int size = 0;
+   uintptr_t  addr_start, addr_end;
+   char protstr[5];
+   *prot = 0;
+
+   maps = fopen("/proc/self/maps", "r");
+   if (!maps)
+   return 0;
+
+   while (fgets(line, sizeof(line), maps)) {
+   if (sscanf(line, "%lx-%lx %4s", _start, _end, 
) == 3) {
+   if (addr_start == (uintptr_t) addr) {
+   size = addr_end - addr_start;
+   if (protstr[0] == 'r')
+   *prot |= 0x4;
+   if (protstr[1] == 'w')
+   *prot |= 0x2;
+   if (protstr[2] == 'x')
+   *prot |= 0x1;
+   break;
+   }
+   }
+   }
+   fclose(maps);
+   return size;
+}
+
+/*
+ * define sys_xyx to call syscall directly.
+ */
+static int sys_mseal(void *start, size_t len)
+{
+   int sret;
+
+   errno = 0;
+   sret = syscall(__NR_mseal, start, len, 0);
+   return sret;
+}
+
+static int sys_mprotect(void *ptr, size_t size, unsigned long prot)
+{
+   int sret;
+
+   errno = 0;
+   sret = syscall(__NR_mprotect, ptr, size, prot);
+   return sret;
+}
+
+static int sys_mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot,
+   unsigned long pkey)
+{
+   int sret;
+
+   errno = 0;
+   sret = syscall(__NR_pkey_mprotect, ptr, size, orig_prot, pkey);
+   return sret;
+}
+
+static void *sys_mmap(void *addr, unsigned long len, unsigned long prot,
+   unsigned long flags, unsigned long fd, unsigned long offset)
+{
+   void *sret;
+
+   errno = 0;
+   sret = (void *) syscall(__NR_mmap, addr, len, prot,
+   flags, fd, offset);
+   return sret;
+}
+
+static int sys_munmap(void *ptr, size_t size)
+{
+   int sret;
+
+   errno = 0;
+   sret = syscall(__NR_munmap, ptr, size);

[PATCH v10 5/5] selftest mm/mseal read-only elf memory segment

2024-04-15 Thread jeffxu

From: Jeff Xu 

Sealing read-only of elf mapping so it can't be changed by mprotect.

Signed-off-by: Jeff Xu 
---
 tools/testing/selftests/mm/.gitignore |   1 +
 tools/testing/selftests/mm/Makefile   |   1 +
 tools/testing/selftests/mm/seal_elf.c | 183 ++
 3 files changed, 185 insertions(+)
 create mode 100644 tools/testing/selftests/mm/seal_elf.c

diff --git a/tools/testing/selftests/mm/.gitignore 
b/tools/testing/selftests/mm/.gitignore
index 98eaa4590f11..0b9ab987601c 100644
--- a/tools/testing/selftests/mm/.gitignore
+++ b/tools/testing/selftests/mm/.gitignore
@@ -48,3 +48,4 @@ va_high_addr_switch
 hugetlb_fault_after_madv
 hugetlb_madv_vs_map
 mseal_test
+seal_elf
diff --git a/tools/testing/selftests/mm/Makefile 
b/tools/testing/selftests/mm/Makefile
index 95d10fe1b3c1..02392c426759 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -60,6 +60,7 @@ TEST_GEN_FILES += mrelease_test
 TEST_GEN_FILES += mremap_dontunmap
 TEST_GEN_FILES += mremap_test
 TEST_GEN_FILES += mseal_test
+TEST_GEN_FILES += seal_elf
 TEST_GEN_FILES += on-fault-limit
 TEST_GEN_FILES += pagemap_ioctl
 TEST_GEN_FILES += thuge-gen
diff --git a/tools/testing/selftests/mm/seal_elf.c 
b/tools/testing/selftests/mm/seal_elf.c
new file mode 100644
index ..61a2f1c94e02
--- /dev/null
+++ b/tools/testing/selftests/mm/seal_elf.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "../kselftest.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * need those definition for manually build using gcc.
+ * gcc -I ../../../../usr/include   -DDEBUG -O3  -DDEBUG -O3 seal_elf.c -o 
seal_elf
+ */
+#define FAIL_TEST_IF_FALSE(c) do {\
+   if (!(c)) {\
+   ksft_test_result_fail("%s, line:%d\n", __func__, 
__LINE__);\
+   goto test_end;\
+   } \
+   } \
+   while (0)
+
+#define SKIP_TEST_IF_FALSE(c) do {\
+   if (!(c)) {\
+   ksft_test_result_skip("%s, line:%d\n", __func__, 
__LINE__);\
+   goto test_end;\
+   } \
+   } \
+   while (0)
+
+
+#define TEST_END_CHECK() {\
+   ksft_test_result_pass("%s\n", __func__);\
+   return;\
+test_end:\
+   return;\
+}
+
+#ifndef u64
+#define u64 unsigned long long
+#endif
+
+/*
+ * define sys_xyx to call syscall directly.
+ */
+static int sys_mseal(void *start, size_t len)
+{
+   int sret;
+
+   errno = 0;
+   sret = syscall(__NR_mseal, start, len, 0);
+   return sret;
+}
+
+static void *sys_mmap(void *addr, unsigned long len, unsigned long prot,
+   unsigned long flags, unsigned long fd, unsigned long offset)
+{
+   void *sret;
+
+   errno = 0;
+   sret = (void *) syscall(__NR_mmap, addr, len, prot,
+   flags, fd, offset);
+   return sret;
+}
+
+inline int sys_mprotect(void *ptr, size_t size, unsigned long prot)
+{
+   int sret;
+
+   errno = 0;
+   sret = syscall(__NR_mprotect, ptr, size, prot);
+   return sret;
+}
+
+static bool seal_support(void)
+{
+   int ret;
+   void *ptr;
+   unsigned long page_size = getpagesize();
+
+   ptr = sys_mmap(NULL, page_size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, 
-1, 0);
+   if (ptr == (void *) -1)
+   return false;
+
+   ret = sys_mseal(ptr, page_size);
+   if (ret < 0)
+   return false;
+
+   return true;
+}
+
+const char somestr[4096] = {"READONLY"};
+
+static void test_seal_elf(void)
+{
+   int ret;
+   FILE *maps;
+   char line[512];
+   int size = 0;
+   uintptr_t  addr_start, addr_end;
+   char prot[5];
+   char filename[256];
+   unsigned long page_size = getpagesize();
+   unsigned long long ptr = (unsigned long long) somestr;
+   char *somestr2 = (char *)somestr;
+
+   /*
+* Modify the protection of readonly somestr
+*/
+   if (((unsigned long long)ptr % page_size) != 0)
+   ptr = (unsigned long long)ptr & ~(page_size - 1);
+
+   ksft_print_msg("somestr = %s\n", somestr);
+   ksft_print_msg("change protection to rw\n");
+   ret = sys_mprotect((void *)ptr, page_size, PROT_READ|PROT_WRITE);
+   FAIL_TEST_IF_FALSE(!ret);
+   *somestr2 = 'A';
+   ksft_print_msg("somestr is modified to: %s\n", somestr);
+   ret = sys_mprotect((void *)ptr, page_size, PROT_READ);
+   FAIL_TEST_IF_FALSE(!ret);
+
+   maps = fopen("/proc/self/maps", "r");
+   FAIL_TEST_IF_FALSE(maps);
+
+   /*
+* apply sealing to elf binary
+*/
+   while (fgets(line, sizeof(line), maps)) {
+   if (sscanf(line, "%lx-%lx %4s %*x %*x:%*x %*u %255[^\n]",
+   _start, _end, , ) == 4) {
+

[PATCH v10 4/5] mseal:add documentation

2024-04-15 Thread jeffxu

From: Jeff Xu 

Add documentation for mseal().

Signed-off-by: Jeff Xu 
---
 Documentation/userspace-api/index.rst |   1 +
 Documentation/userspace-api/mseal.rst | 199 ++
 2 files changed, 200 insertions(+)
 create mode 100644 Documentation/userspace-api/mseal.rst

diff --git a/Documentation/userspace-api/index.rst 
b/Documentation/userspace-api/index.rst
index afecfe3cc4a8..5926115ec0ed 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -20,6 +20,7 @@ System calls
futex2
ebpf/index
ioctl/index
+   mseal
 
 Security-related interfaces
 ===
diff --git a/Documentation/userspace-api/mseal.rst 
b/Documentation/userspace-api/mseal.rst
new file mode 100644
index ..4132eec995a3
--- /dev/null
+++ b/Documentation/userspace-api/mseal.rst
@@ -0,0 +1,199 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=
+Introduction of mseal
+=
+
+:Author: Jeff Xu 
+
+Modern CPUs support memory permissions such as RW and NX bits. The memory
+permission feature improves security stance on memory corruption bugs, i.e.
+the attacker can’t just write to arbitrary memory and point the code to it,
+the memory has to be marked with X bit, or else an exception will happen.
+
+Memory sealing additionally protects the mapping itself against
+modifications. This is useful to mitigate memory corruption issues where a
+corrupted pointer is passed to a memory management system. For example,
+such an attacker primitive can break control-flow integrity guarantees
+since read-only memory that is supposed to be trusted can become writable
+or .text pages can get remapped. Memory sealing can automatically be
+applied by the runtime loader to seal .text and .rodata pages and
+applications can additionally seal security critical data at runtime.
+
+A similar feature already exists in the XNU kernel with the
+VM_FLAGS_PERMANENT flag [1] and on OpenBSD with the mimmutable syscall [2].
+
+User API
+
+mseal()
+---
+The mseal() syscall has the following signature:
+
+``int mseal(void addr, size_t len, unsigned long flags)``
+
+**addr/len**: virtual memory address range.
+
+The address range set by ``addr``/``len`` must meet:
+   - The start address must be in an allocated VMA.
+   - The start address must be page aligned.
+   - The end address (``addr`` + ``len``) must be in an allocated VMA.
+   - no gap (unallocated memory) between start and end address.
+
+The ``len`` will be paged aligned implicitly by the kernel.
+
+**flags**: reserved for future use.
+
+**return values**:
+
+- ``0``: Success.
+
+- ``-EINVAL``:
+- Invalid input ``flags``.
+- The start address (``addr``) is not page aligned.
+- Address range (``addr`` + ``len``) overflow.
+
+- ``-ENOMEM``:
+- The start address (``addr``) is not allocated.
+- The end address (``addr`` + ``len``) is not allocated.
+- A gap (unallocated memory) between start and end address.
+
+- ``-EPERM``:
+- sealing is supported only on 64-bit CPUs, 32-bit is not supported.
+
+- For above error cases, users can expect the given memory range is
+  unmodified, i.e. no partial update.
+
+- There might be other internal errors/cases not listed here, e.g.
+  error during merging/splitting VMAs, or the process reaching the max
+  number of supported VMAs. In those cases, partial updates to the given
+  memory range could happen. However, those cases should be rare.
+
+**Blocked operations after sealing**:
+Unmapping, moving to another location, and shrinking the size,
+via munmap() and mremap(), can leave an empty space, therefore
+can be replaced with a VMA with a new set of attributes.
+
+Moving or expanding a different VMA into the current location,
+via mremap().
+
+Modifying a VMA via mmap(MAP_FIXED).
+
+Size expansion, via mremap(), does not appear to pose any
+specific risks to sealed VMAs. It is included anyway because
+the use case is unclear. In any case, users can rely on
+merging to expand a sealed VMA.
+
+mprotect() and pkey_mprotect().
+
+Some destructive madvice() behaviors (e.g. MADV_DONTNEED)
+for anonymous memory, when users don't have write permission to the
+memory. Those behaviors can alter region contents by discarding pages,
+effectively a memset(0) for anonymous memory.
+
+Kernel will return -EPERM for blocked operations.
+
+For blocked operations, one can expect the given address is unmodified,
+i.e. no partial update. Note, this is different from existing mm
+system call behaviors, where partial updates are made till an error is
+found and returned to userspace. To give an example:
+
+Assume following code sequence:
+
+- ptr = mmap(null, 8192, PROT_NONE);
+- munmap(ptr + 4096, 4096);
+- ret1 = mprotect(ptr, 8192, PROT_READ);
+- mseal(ptr, 4096);
+- ret2 = mprotect(ptr, 8192, PROT_NONE);
+
+ret1 will

[PATCH v10 2/5] mseal: add mseal syscall

2024-04-15 Thread jeffxu

From: Jeff Xu 

The new mseal() is an syscall on 64 bit CPU, and with
following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

Following input during RFC are incooperated into this patch:

Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Linus Torvalds: assisting in defining system call signature and scope.
Liam R. Howlett: perf optimization.
Theo de Raadt: sharing the experiences and insight gained from
  implementing mimmutable() in OpenBSD.

Finally, the idea that inspired this patch comes from Stephen Röttger’s
work in Chrome V8 CFI.

Signed-off-by: Jeff Xu 
---
 include/linux/syscalls.h |   1 +
 mm/Makefile  |   4 +
 mm/internal.h|  37 +
 mm/madvise.c |  12 ++
 mm/mmap.c|  31 +++-
 mm/mprotect.c|  10 ++
 mm/mremap.c  |  31 
 mm/mseal.c   | 307 +++
 8 files changed, 432 insertions(+), 1 deletion(-)
 create mode 100644 mm/mseal.c

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e619ac10cd23..9104952d323d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -821,6 +821,7 @@ asmlinkage long sys_process_mrelease(int pidfd, unsigned 
int flags);
 asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
unsigned long prot, unsigned long pgoff,
unsigned long flags);
+asmlinkage long sys_mseal(unsigned long start, size_t len, unsigned long 
flags);
 asmlinkage long sys_mbind(unsigned long start, unsigned long len,
unsigned long mode,
const unsigned long __user *nmask,
diff --git a/mm/Makefile b/mm/Makefile
index 4abb40b911ec..739811890e36 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -42,6 +42,10 @@ ifdef CONFIG_CROSS_MEMORY_ATTACH
 mmu-$(CONFIG_MMU)  += process_vm_access.o
 endif
 
+ifdef CONFIG_64BIT
+mmu-$(CONFIG_MMU)  += mseal.o
+endif
+
 obj-y  := filemap.o mempool.o oom_kill.o fadvise.o \
   maccess.o page-writeback.o folio-compat.o \
   readahead.o swap.o truncate.o vmscan.o shrinker.o \
diff --git a/mm/internal.h b/mm/internal.h
index 7e486f2c502c..a858161489b3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1326,6 +1326,43 @@ void __meminit __init_single_page(struct page *page, 
unsigned long pfn,
 unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
  int priority);
 
+#ifdef CONFIG_64BIT
+/* VM is sealed, in vm_flags */
+#define VM_SEALED  _BITUL(63)
+#endif
+
+#ifdef CONFIG_64BIT
+static inline int can_do_mseal(unsigned long flags)
+{
+   if (flags)
+   return -EINVAL;
+
+   return 0;
+}
+
+bool can_modify_mm(struct mm_struct *mm, unsigned long start,
+   unsigned long end);
+bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
+   unsigned long end, int behavior);
+#else
+static inline int can_do_mseal(unsigned long flags)
+{
+   return -EPERM;
+}
+
+static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start,
+   unsigned long end)
+{
+   return true;
+}
+
+static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long 
start,
+   unsigned long end, int behavior)
+{
+   return true;
+}
+#endif
+
 #ifdef CONFIG_SHRINKER_DEBUG
 static inline __printf(2, 0) int shrinker_debugfs_name_alloc(
struct shrinker *shrinker, const char *fmt, va_list ap)
diff --git a/mm/madvise.c b/mm/madvise.c
index 44a498c94158..f7d589534e82 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1394,6 +1394,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned 
long start,
  *  -EIO- an I/O error occurred while paging in data.
  *  -EBADF  - map exists, but area maps something that isn't a file.
  *  -EAGAIN - a kernel resource was temporarily unavailable.
+ *

[PATCH v10 1/5] mseal: Wire up mseal syscall

2024-04-15 Thread jeffxu

From: Jeff Xu 

Wire up mseal syscall for all architectures.

Signed-off-by: Jeff Xu 
---
 arch/alpha/kernel/syscalls/syscall.tbl  | 1 +
 arch/arm/tools/syscall.tbl  | 1 +
 arch/arm64/include/asm/unistd.h | 2 +-
 arch/arm64/include/asm/unistd32.h   | 2 ++
 arch/m68k/kernel/syscalls/syscall.tbl   | 1 +
 arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   | 1 +
 arch/parisc/kernel/syscalls/syscall.tbl | 1 +
 arch/powerpc/kernel/syscalls/syscall.tbl| 1 +
 arch/s390/kernel/syscalls/syscall.tbl   | 1 +
 arch/sh/kernel/syscalls/syscall.tbl | 1 +
 arch/sparc/kernel/syscalls/syscall.tbl  | 1 +
 arch/x86/entry/syscalls/syscall_32.tbl  | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl  | 1 +
 arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
 include/uapi/asm-generic/unistd.h   | 5 -
 kernel/sys_ni.c | 1 +
 19 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
b/arch/alpha/kernel/syscalls/syscall.tbl
index 8ff110826ce2..d8f96362e9f8 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -501,3 +501,4 @@
 569common  lsm_get_self_attr   sys_lsm_get_self_attr
 570common  lsm_set_self_attr   sys_lsm_set_self_attr
 571common  lsm_list_modulessys_lsm_list_modules
+572common  mseal   sys_mseal
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index b6c9e01e14f5..2ed7d229c8f9 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -475,3 +475,4 @@
 459common  lsm_get_self_attr   sys_lsm_get_self_attr
 460common  lsm_set_self_attr   sys_lsm_set_self_attr
 461common  lsm_list_modulessys_lsm_list_modules
+462common  mseal   sys_mseal
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 491b2b9bd553..1346579f802f 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
 #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls   462
+#define __NR_compat_syscalls   463
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 7118282d1c79..266b96acc014 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -929,6 +929,8 @@ __SYSCALL(__NR_lsm_get_self_attr, sys_lsm_get_self_attr)
 __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
 #define __NR_lsm_list_modules 461
 __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
+#define __NR_mseal 462
+__SYSCALL(__NR_mseal, sys_mseal)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl 
b/arch/m68k/kernel/syscalls/syscall.tbl
index 7fd43fd4c9f2..22a3cbd4c602 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -461,3 +461,4 @@
 459common  lsm_get_self_attr   sys_lsm_get_self_attr
 460common  lsm_set_self_attr   sys_lsm_set_self_attr
 461common  lsm_list_modulessys_lsm_list_modules
+462common  mseal   sys_mseal
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl 
b/arch/microblaze/kernel/syscalls/syscall.tbl
index b00ab2cabab9..2b81a6bd78b2 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -467,3 +467,4 @@
 459common  lsm_get_self_attr   sys_lsm_get_self_attr
 460common  lsm_set_self_attr   sys_lsm_set_self_attr
 461common  lsm_list_modulessys_lsm_list_modules
+462common  mseal   sys_mseal
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 83cfc9eb6b88..cc869f5d5693 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -400,3 +400,4 @@
 459n32 lsm_get_self_attr   sys_lsm_get_self_attr
 460n32 lsm_set_self_attr   sys_lsm_set_self_attr
 461n32 lsm_list_modulessys_lsm_list_modules
+462n32 mseal   sys_mseal
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl 
b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 532b855df589..1464c6be6eb3 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -376,3 +376,4 @@
 459n64 lsm_get_self_attr

[PATCH v10 0/5] Introduce mseal

2024-04-15 Thread jeffxu

From: Jeff Xu 

This is V10 version, it rebases v9 patch to 6.9.rc3.
We also applied and tested mseal() in chrome and chromebook.

--

This patchset proposes a new mseal() syscall for the Linux kernel.

In a nutshell, mseal() protects the VMAs of a given virtual memory
range against modifications, such as changes to their permission bits.

Modern CPUs support memory permissions, such as the read/write (RW)
and no-execute (NX) bits. Linux has supported NX since the release of
kernel version 2.6.8 in August 2004 [1]. The memory permission feature
improves the security stance on memory corruption bugs, as an attacker
cannot simply write to arbitrary memory and point the code to it. The
memory must be marked with the X bit, or else an exception will occur.
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct). mseal() additionally protects
the VMA itself against modifications of the selected seal type.

Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped. Memory sealing can
automatically be applied by the runtime loader to seal .text and
.rodata pages and applications can additionally seal security critical
data at runtime. A similar feature already exists in the XNU kernel
with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the
mimmutable syscall [4]. Also, Chrome wants to adopt this feature for
their CFI work [2] and this patchset has been designed to be
compatible with the Chrome use case.

Two system calls are involved in sealing the map:  mmap() and mseal().

The new mseal() is an syscall on 64 bit CPU, and with
following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this
API.

Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications. For example, in
the case of libc, sealing is only applied to read-only (RO) or
read-execute (RX) memory segments (such as .text and .RELRO) to
prevent them from becoming writable, the lifetime of those mappings
are tied to the lifetime of the process.

Chrome wants to seal two large address space reservations that are
managed by different allocators. The memory is mapped RW- and RWX
respectively but write access to it is restricted using pkeys (or in
the future ARM permission overlay extensions). The lifetime of those
mappings are not tied to the lifetime of the process, therefore, while
the memory is sealed, the allocators still need to free or discard the
unused memory. For example, with madvise(DONTNEED).

However, always allowing madvise(DONTNEED) on this range poses a
security risk. For example if a jump instruction crosses a page
boundary and the second page gets discarded, it will overwrite the
target bytes with zeros and change the control flow. Checking
write-permission before the discard operation allows us to control
when the operation is valid. In this case, the madvise will only
succeed if the executing thread has PKEY write permissions and PKRU
changes are protected in software by control-flow integrity.

Although the initial version of this patch series is targeting the
Chrome browser as its first user, it became evident during upstream
discussions that we would also want to ensure that the patch set
eventually is a complete solution for memory sealing and compatible
with other use cases. The specific scenario currently in mind is
glibc's use case of loading and sealing ELF executables. To this end,
Stephen is working on a change to glibc to add sealing support to the
dynamic linker, which will seal all non-writable segments at startup.
Once this

[GIT PULL] Kselftest fixes update for Linux 6.9-rc5

2024-04-15 Thread Shuah Khan


Hi Linus,

Please pull the following kselftest fixes update for Linux 6.9-rc5.

This kselftest fixes update for Linux 6.9-rc5 consists of a fix to
kselftest harness to prevent infinite loop triggered in an assert
in FIXTURE_TEARDOWN and a fix to a problem seen in being able to stop
subsystem-enable tests when sched events are being traced.

diff is attached.

thanks,
-- Shuah



The following changes since commit 224fe424c356cb5c8f451eca4127f32099a6f764:

  selftests: dmabuf-heap: add config file for the test (2024-03-29 13:57:14 
-0600)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest 
tags/linux_kselftest-fixes-6.9-rc5

for you to fetch changes up to 72d7cb5c190befbb095bae7737e71560ec0fcaa6:

  selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN 
(2024-04-04 10:50:53 -0600)


linux_kselftest-fixes-6.9-rc5

This kselftest fixes update for Linux 6.9-rc5 consists of a fix to
kselftest harness to prevent infinite loop triggered in an assert
in FIXTURE_TEARDOWN and a fix to a problem seen in being able to stop
subsystem-enable tests when sched events are being traced.


Shengyu Li (1):
  selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN

Yuanhe Shu (1):
  selftests/ftrace: Limit length in subsystem-enable tests

 tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc | 6 +++---
 tools/testing/selftests/kselftest_harness.h | 5 -
 2 files changed, 7 insertions(+), 4 deletions(-)




diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
index b1ede6249866..b7c8f29c09a9 100644
--- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
+++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
@@ -18,7 +18,7 @@ echo 'sched:*' > set_event
 
 yield
 
-count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
+count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
 if [ $count -lt 3 ]; then
 fail "at least fork, exec and exit events should be recorded"
 fi
@@ -29,7 +29,7 @@ echo 1 > events/sched/enable
 
 yield
 
-count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
+count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
 if [ $count -lt 3 ]; then
 fail "at least fork, exec and exit events should be recorded"
 fi
@@ -40,7 +40,7 @@ echo 0 > events/sched/enable
 
 yield
 
-count=`cat trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
+count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
 if [ $count -ne 0 ]; then
 fail "any of scheduler events should not be recorded"
 fi
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 4fd735e48ee7..230d62884885 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -383,6 +383,7 @@
 		FIXTURE_DATA(fixture_name) self; \
 		pid_t child = 1; \
 		int status = 0; \
+		bool jmp = false; \
 		memset(, 0, sizeof(FIXTURE_DATA(fixture_name))); \
 		if (setjmp(_metadata->env) == 0) { \
 			/* Use the same _metadata. */ \
@@ -399,8 +400,10 @@
 _metadata->exit_code = KSFT_FAIL; \
 			} \
 		} \
+		else \
+			jmp = true; \
 		if (child == 0) { \
-			if (_metadata->setup_completed && !_metadata->teardown_parent) \
+			if (_metadata->setup_completed && !_metadata->teardown_parent && !jmp) \
 fixture_name##_teardown(_metadata, , variant->data); \
 			_exit(0); \
 		} \

Re: [PATCH net-next 1/5] selftests: drv-net: define endpoint structures

2024-04-15 Thread Paolo Abeni

On Mon, 2024-04-15 at 07:19 -0700, Jakub Kicinski wrote:
> On Mon, 15 Apr 2024 10:57:31 +0200 Paolo Abeni wrote:
> > If I read correctly the above will do a full ssh handshake for each
> > command. If the test script/setup is complex, I think/fear the overhead
> > could become a bit cumbersome.
> 
> Connection reuse. I wasn't sure if I should add a hint to the README,
> let me do so.

I'm sorry for the multiple, incremental feedback. I think such hinto
the readme will be definitely useful, thanks!

Paolo

Re: [PATCH net-next 5/5] selftests: drv-net: add a trivial ping test

2024-04-15 Thread Paolo Abeni

On Mon, 2024-04-15 at 07:33 -0700, Jakub Kicinski wrote:
> On Mon, 15 Apr 2024 11:31:05 +0200 Paolo Abeni wrote:
> > On Fri, 2024-04-12 at 16:37 -0700, Jakub Kicinski wrote:
> > > +def ping_v4(cfg) -> None:
> > > +if not cfg.v4:
> > > +raise KsftXfailEx()
> > > +
> > > +cmd(f"ping -c 1 -W0.5 {cfg.ep_v4}")
> > > +cmd(f"ping -c 1 -W0.5 {cfg.v4}", host=cfg.endpoint)  
> > 
> > Very minor nit, I personally find a bit more readable:
> > 
> > cfg.endpoint.cmd()
> > 
> > Which is already supported by the current infra, right?
> > 
> > With both endpoint possibly remote could be:
> > 
> > cfg.ep1.cmd()
> > cfg.ep2.cmd()
> 
> As I said in the cover letter, I don't want to push us too much towards
> classes. The argument format make local and local+remote tests look more
> similar.

I guess it's a matter of personal preferences. I know mine are usually
quite twisted ;)

I'm fine with either syntax.

Cheers,

Paolo

Re: [PATCH] selftests: iommu: add config needed for iommufd_fail_nth

2024-04-15 Thread Jason Gunthorpe

On Sun, Apr 14, 2024 at 07:39:58PM +0500, Muhammad Usama Anjum wrote:
> On 4/5/24 5:10 AM, Jason Gunthorpe wrote:
> > On Mon, Mar 25, 2024 at 02:11:41PM +0500, Muhammad Usama Anjum wrote:
> >> On 3/25/24 2:00 PM, Muhammad Usama Anjum wrote:
> >>> Add FAULT_INJECTION_DEBUG_FS and FAILSLAB configurations which are
> >>> needed by iommufd_fail_nth test.
> >>>
> >>> Signed-off-by: Muhammad Usama Anjum 
> >>> ---
> >>> While building and running these tests on x86, defconfig had these
> >>> configs enabled. But ARM64's defconfig doesn't enable these configs.
> >>> Hence the config options are being added explicitly in this patch.
> >> Please disregard this extra comment. Overall this patch is needed to enable
> >> these config options of x86 and ARM both.
> > 
> > I picked this and the other patch up, thanks
> Not sure why but I'm unable to find this patch in next and in your tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next
> 
> Maybe this patch was missed?

ah I made some mistakes, all sorted thanks

Jason

Re: [PATCH net-next 1/5] selftests: drv-net: define endpoint structures

2024-04-15 Thread Paolo Abeni

On Mon, 2024-04-15 at 07:19 -0700, Jakub Kicinski wrote:
> On Mon, 15 Apr 2024 10:57:31 +0200 Paolo Abeni wrote:
> > If I read correctly the above will do a full ssh handshake for each
> > command. If the test script/setup is complex, I think/fear the overhead
> > could become a bit cumbersome.
> 
> Connection reuse. I wasn't sure if I should add a hint to the README,
> let me do so.
> 
> > Would using something alike Fabric to create a single connection at
> > endpoint instantiation time and re-using it for all the command be too
> > much? 
> 
> IDK what "Fabric" is, if its commonly used we can add the option
> in tree. If less commonly - I hope the dynamic loading scheme
> will allow users to very easily drop in their own class that 
> integrates with Fabric, without dirtying the tree? :)

I'm really a python-expert. 'Fabric' a python library to execute
commands over ssh:

https://www.fabfile.org/
> 
No idea how much commont it is.

I'm fine with ssh connection sharing.

Thanks,

Paolo

Re: [PATCH] selftests: Mark ksft_exit_fail_perror() as __noreturn

2024-04-15 Thread Nathan Chancellor

On Sun, Apr 14, 2024 at 11:26:53AM +0500, Muhammad Usama Anjum wrote:
> Let the compilers (clang) know that this function would just call
> exit() and would never return. It is needed to avoid false positive
> static analysis errors. All similar functions calling exit()
> unconditionally have been marked as __noreturn.
> 
> Signed-off-by: Muhammad Usama Anjum 

Reviewed-by: Nathan Chancellor 

> ---
> This patch has been suggested [1] and tested on top of the following
> patches:
> - f07041728422 ("selftests: add ksft_exit_fail_perror()") which is
>   in kselftest tree already
> - ("kselftest: Mark functions that unconditionally call exit() as
>   __noreturn") would appear in tip/timers/urgent
> 
> [1] 
> https://lore.kernel.org/all/8254ab4d-9cb6-402e-80dd-d9ec70d77...@linuxfoundation.org
> ---
>  tools/testing/selftests/kselftest.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/kselftest.h 
> b/tools/testing/selftests/kselftest.h
> index 050c5fd018400..b43a7a7ca4b40 100644
> --- a/tools/testing/selftests/kselftest.h
> +++ b/tools/testing/selftests/kselftest.h
> @@ -372,7 +372,7 @@ static inline __printf(1, 2) int ksft_exit_fail_msg(const 
> char *msg, ...)
>   exit(KSFT_FAIL);
>  }
>  
> -static inline void ksft_exit_fail_perror(const char *msg)
> +static inline __noreturn void ksft_exit_fail_perror(const char *msg)
>  {
>  #ifndef NOLIBC
>   ksft_exit_fail_msg("%s: %s (%d)\n", msg, strerror(errno), errno);
> -- 
> 2.39.2
> 
>

Re: [PATCH v2 bpf-next 2/6] selftests/bpf: Implement socket kfuncs for bpf_testmod

2024-04-15 Thread Jordan Rife

> It would be better to check if args->msglen > sizeof(arg->msg) although
> this function is just for test cases. Same for args->addr.addrlen.

Ack. I will add this.

Thanks,
Jordan


On Fri, Apr 12, 2024 at 6:26 PM Kui-Feng Lee  wrote:
>
>
>
> On 4/12/24 09:52, Jordan Rife wrote:
> > This patch adds a set of kfuncs to bpf_testmod that can be used to
> > manipulate a socket from kernel space.
> >
> > Signed-off-by: Jordan Rife 
> > ---
> >   .../selftests/bpf/bpf_testmod/bpf_testmod.c   | 139 ++
> >   .../bpf/bpf_testmod/bpf_testmod_kfunc.h   |  27 
> >   2 files changed, 166 insertions(+)
> >
> > diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c 
> > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > index 39ad96a18123f..663df8148097e 100644
> > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > @@ -10,18 +10,29 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> >   #include "bpf_testmod.h"
> >   #include "bpf_testmod_kfunc.h"
> >
> >   #define CREATE_TRACE_POINTS
> >   #include "bpf_testmod-events.h"
> >
> > +#define CONNECT_TIMEOUT_SEC 1
> > +
> >   typedef int (*func_proto_typedef)(long);
> >   typedef int (*func_proto_typedef_nested1)(func_proto_typedef);
> >   typedef int (*func_proto_typedef_nested2)(func_proto_typedef_nested1);
> >
> >   DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123;
> >   long bpf_testmod_test_struct_arg_result;
> > +static struct socket *sock;
> >
> >   struct bpf_testmod_struct_arg_1 {
> >   int a;
> > @@ -494,6 +505,124 @@ __bpf_kfunc static u32 
> > bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused
> >   return arg;
> >   }
> >
> > +__bpf_kfunc int bpf_kfunc_init_sock(struct init_sock_args *args)
> > +{
> > + int proto;
> > +
> > + if (sock)
> > + pr_warn("%s called without releasing old sock", __func__);
> > +
> > + switch (args->af) {
> > + case AF_INET:
> > + case AF_INET6:
> > + proto = args->type == SOCK_STREAM ? IPPROTO_TCP : IPPROTO_UDP;
> > + break;
> > + case AF_UNIX:
> > + proto = PF_UNIX;
> > + break;
> > + default:
> > + pr_err("invalid address family %d\n", args->af);
> > + return -EINVAL;
> > + }
> > +
> > + return sock_create_kern(_net, args->af, args->type, proto, 
> > );
> > +}
> > +
> > +__bpf_kfunc void bpf_kfunc_close_sock(void)
> > +{
> > + if (sock) {
> > + sock_release(sock);
> > + sock = NULL;
> > + }
> > +}
> > +
> > +__bpf_kfunc int bpf_kfunc_call_kernel_connect(struct addr_args *args)
> > +{
> > + /* Set timeout for call to kernel_connect() to prevent it from 
> > hanging,
> > +  * and consider the connection attempt failed if it returns
> > +  * -EINPROGRESS.
> > +  */
> > + sock->sk->sk_sndtimeo = CONNECT_TIMEOUT_SEC * HZ;
> > +
> > + return kernel_connect(sock, (struct sockaddr *)>addr,
> > +   args->addrlen, 0);
> > +}
> > +
> > +__bpf_kfunc int bpf_kfunc_call_kernel_bind(struct addr_args *args)
> > +{
> > + return kernel_bind(sock, (struct sockaddr *)>addr, 
> > args->addrlen);
> > +}
> > +
> > +__bpf_kfunc int bpf_kfunc_call_kernel_listen(void)
> > +{
> > + return kernel_listen(sock, 128);
> > +}
> > +
> > +__bpf_kfunc int bpf_kfunc_call_kernel_sendmsg(struct sendmsg_args *args)
> > +{
> > + struct msghdr msg = {
> > + .msg_name   = >addr.addr,
> > + .msg_namelen= args->addr.addrlen,
> > + };
> > + struct kvec iov;
> > + int err;
> > +
> > + iov.iov_base = args->msg;
> > + iov.iov_len  = args->msglen;
>
> It would be better to check if args->msglen > sizeof(arg->msg) although
> this function is just for test cases. Same for args->addr.addrlen.
>
> > +
> > + err = kernel_sendmsg(sock, , , 1, args->msglen);
> > + args->addr.addrlen = msg.msg_namelen;
> > +
> > + return err;
> > +}
> > +
> > +__bpf_kfunc int bpf_kfunc_call_sock_sendmsg(struct sendmsg_args *args)
> > +{
> > + struct msghdr msg = {
> > + .msg_name   = >addr.addr,
> > + .msg_namelen= args->addr.addrlen,
> > + };
> > + struct kvec iov;
> > + int err;
> > +
> > + iov.iov_base = args->msg;
> > + iov.iov_len  = args->msglen;
> > +
> > + iov_iter_kvec(_iter, ITER_SOURCE, , 1, args->msglen);
> > + err = sock_sendmsg(sock, );
> > + args->addr.addrlen = msg.msg_namelen;
> > +
> > + return err;
> > +}
> > +
> > +__bpf_kfunc int bpf_kfunc_call_kernel_getsockname(struct addr_args *args)
> > +{
> > + int err;
> > +
> > + err = kernel_getsockname(sock, (struct sockaddr *)>addr);
> > + if (err < 0)
> > + goto out;
> > +
> > + args->addrlen = err;
> > + err =

[PATCH 2/2] selftests: power_supply: Make it POSIX-compliant

2024-04-15 Thread Nícolas F . R . A . Prado

There is one use of bash specific syntax in the script. Change it to the
equivalent POSIX syntax. This doesn't change functionality and allows
the test to be run on shells other than bash.

Reported-by: Mike Looijmans 
Closes: 
https://lore.kernel.org/all/efae4037-c22a-40be-8ba9-7c1c12ece...@topic.nl/
Fixes: 4a679c5afca0 ("selftests: Add test to verify power supply properties")
Signed-off-by: Nícolas F. R. A. Prado 
---
 tools/testing/selftests/power_supply/test_power_supply_properties.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/tools/testing/selftests/power_supply/test_power_supply_properties.sh 
b/tools/testing/selftests/power_supply/test_power_supply_properties.sh
index df272dfe1d2a..a66b1313ed88 100755
--- a/tools/testing/selftests/power_supply/test_power_supply_properties.sh
+++ b/tools/testing/selftests/power_supply/test_power_supply_properties.sh
@@ -23,7 +23,7 @@ count_tests() {
total_tests=0
 
for i in $SUPPLIES; do
-   total_tests=$(("$total_tests" + "$NUM_TESTS"))
+   total_tests=$((total_tests + NUM_TESTS))
done
 
echo "$total_tests"

-- 
2.44.0

[PATCH 1/2] selftests: ktap_helpers: Make it POSIX-compliant

2024-04-15 Thread Nícolas F . R . A . Prado

There are a couple uses of bash specific syntax in the script. Change
them to the equivalent POSIX syntax. This doesn't change functionality
and allows non-bash test scripts to make use of these helpers.

Reported-by: Mike Looijmans 
Closes: 
https://lore.kernel.org/all/efae4037-c22a-40be-8ba9-7c1c12ece...@topic.nl/
Fixes: 2dd0b5a8fcc4 ("selftests: ktap_helpers: Add a helper to finish the test")
Fixes: 14571ab1ad21 ("kselftest: Add new test for detecting unprobed Devicetree 
devices")
Signed-off-by: Nícolas F. R. A. Prado 
---
 tools/testing/selftests/kselftest/ktap_helpers.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kselftest/ktap_helpers.sh 
b/tools/testing/selftests/kselftest/ktap_helpers.sh
index f2fbb914e058..79a125eb24c2 100644
--- a/tools/testing/selftests/kselftest/ktap_helpers.sh
+++ b/tools/testing/selftests/kselftest/ktap_helpers.sh
@@ -43,7 +43,7 @@ __ktap_test() {
directive="$3" # optional
 
local directive_str=
-   [[ ! -z "$directive" ]] && directive_str="# $directive"
+   [ ! -z "$directive" ] && directive_str="# $directive"
 
echo $result $KTAP_TESTNO $description $directive_str
 
@@ -99,7 +99,7 @@ ktap_exit_fail_msg() {
 ktap_finished() {
ktap_print_totals
 
-   if [ $(("$KTAP_CNT_PASS" + "$KTAP_CNT_SKIP")) -eq "$KSFT_NUM_TESTS" ]; 
then
+   if [ $((KTAP_CNT_PASS + KTAP_CNT_SKIP)) -eq "$KSFT_NUM_TESTS" ]; then
exit "$KSFT_PASS"
else
exit "$KSFT_FAIL"

-- 
2.44.0

[PATCH 0/2] selftests: Make sh helper and power supply test POSIX-compliant

2024-04-15 Thread Nícolas F . R . A . Prado

The patches in this series make the ktap sh helper and the power_supply
selftest POSIX-compliant. Tested with bash, dash and busybox ash.

Signed-off-by: Nícolas F. R. A. Prado 
---
Nícolas F. R. A. Prado (2):
  selftests: ktap_helpers: Make it POSIX-compliant
  selftests: power_supply: Make it POSIX-compliant

 tools/testing/selftests/kselftest/ktap_helpers.sh| 4 ++--
 tools/testing/selftests/power_supply/test_power_supply_properties.sh | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
---
base-commit: 7e74ee01d1754156ed3706b61e793fbd46f5cd7b
change-id: 20240415-supply-selftest-posix-sh-aee99cf85e8f

Best regards,
-- 
Nícolas F. R. A. Prado

Re: [PATCH net-next 0/5] selftests: drv-net: support testing with a remote system

2024-04-15 Thread Willem de Bruijn

Jakub Kicinski wrote:
> Hi!
> 
> Implement support for tests which require access to a remote system /
> endpoint which can generate traffic.
> This series concludes the "groundwork" for upstream driver tests.
> 
> I wanted to support the three models which came up in discussions:
>  - SW testing with netdevsim
>  - "local" testing with two ports on the same system in a loopback
>  - "remote" testing via SSH
> so there is a tiny bit of an abstraction which wraps up how "remote"
> commands are executed. Otherwise hopefully there's nothing surprising.
> 
> I'm only adding a ping test. I had a bigger one written but I was
> worried we'll get into discussing the details of the test itself
> and how I chose to hack up netdevsim, instead of the test infra...
> So that test will be a follow up :)
> 
> ---
> 
> TBH, this series is on top of the one I posted in the morning:
> https://lore.kernel.org/all/20240412141436.828666-1-k...@kernel.org/
> but it applies cleanly, and all it needs is the ifindex definition
> in netdevsim. Testing with real HW works fine even without the other
> series.
> 
> Jakub Kicinski (5):
>   selftests: drv-net: define endpoint structures
>   selftests: drv-net: add stdout to the command failed exception
>   selftests: drv-net: factor out parsing of the env
>   selftests: drv-net: construct environment for running tests which
> require an endpoint
>   selftests: drv-net: add a trivial ping test

For the series:

Reviewed-by: Willem de Bruijn 

I left some comments for discussion, but did not spell out the more
important part: series looks great to me. Thanks for building this!

Re: [PATCH net-next 4/5] selftests: drv-net: construct environment for running tests which require an endpoint

2024-04-15 Thread Willem de Bruijn

Jakub Kicinski wrote:
> On Sun, 14 Apr 2024 12:45:43 -0400 Willem de Bruijn wrote:
> > Overall, this is really cool stuff (obviously)!
> > 
> > REMOTE instead of EP?
> 
> If I have to (:
> Endpoint isn't great.
> But remote doesn't seem much better, and it doesn't have a nice
> abbreviation :(

It pairs well with local.

Since in some tests the (local) machine under test is the sender and
in others it is the receiver, we cannot use SERVER/CLIENT or so.
 
> > Apparently I missed the earlier discussion. Would it also be possible
> > to have both sides be remote. Where the test runner might run on the
> > build host, but the kernel under test is run on two test machines.
> > 
> > To a certain extent, same for having two equivalent child network
> > namespaces isolated from the runner's environment.
> 
> I was thinking about it (and even wrote one large test which uses
> 2 namespaces [1]). But I could not convince myself that the added
> complication is worth it.
> 
> [1] https://github.com/kuba-moo/linux/blob/psp/tools/net/ynl/psp.py
> 
> Local namespace testing is one thing, entering the namespace from
> python and using the right process abstraction to make sure garbage
> collector doesn't collect the namespace before the test exits it
> (sigh) is all doable. But we lose the ability interact with the local
> system directly when the endpoint is remote. No local FW access with
> read/write, we have to "cat" and "echo" like in bash. No YNL access,
> unless we ship specs and CLI over.

In cases like testing jumbo frames (or other MTU, like 4K),
configuration changes will have to be made on both the machine under
test and the remote traffic generator/sink. It seems to me
unavoidable. Most of the two-machine tests I require an equal amount
of setup on both sides. But again, cart before the horse. We can
always revisit this later if needed.
 
> So I concluded that we're better off leaning on kselftest for
> remote/remote. make install, copy the tests over, run them remotely.
> I may be biased tho, I don't have much use for remote/remote in my
> development env.
> 
> > Use FC00::/7 ULA addresses?
> 
> Doesn't ULA have some magic address selection rules which IETF 
> is just trying to fix now? IIUC 0100:: is the documentation prefix,
> so shouldn't be too bad?

RFC  defines this as the "Discard Prefix".

1 2 >

1 - 100 of 161 matches

Mail list logo