Re: [PATCH 2/2] powerpc/{32,book3e}: kcsan: Extend KCSAN Support

2023-02-15 Thread Christophe Leroy


Le 16/02/2023 à 06:09, Rohan McLure a écrit :
> Enable HAVE_ARCH_KCSAN on all powerpc platforms, permitting use of the
> kernel concurrency sanitiser through the CONFIG_KCSAN_* kconfig options.
> 
> Boots and passes selftests on 32-bit and 64-bit platforms. See
> documentation in Documentation/dev-tools/kcsan.rst for more information.
> 
> Signed-off-by: Rohan McLure 
> ---
> New patch
> ---
>   arch/powerpc/Kconfig | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 2c9cdf1d8761..45771448d47a 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -197,7 +197,7 @@ config PPC
>   select HAVE_ARCH_KASAN  if PPC_RADIX_MMU
>   select HAVE_ARCH_KASAN  if PPC_BOOK3E_64
>   select HAVE_ARCH_KASAN_VMALLOC  if HAVE_ARCH_KASAN
> - select HAVE_ARCH_KCSAN  if PPC_BOOK3S_64
> + select HAVE_ARCH_KCSAN

So that's a followup of a not yet posted version v5 of the other series ?
Why not just add patch 1 in that series and have KCSAN for all powerpc 
at once ?

>   select HAVE_ARCH_KFENCE if ARCH_SUPPORTS_DEBUG_PAGEALLOC
>   select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
>   select HAVE_ARCH_KGDB


Re: [PATCH 1/2] kcsan: xtensa: Add atomic builtin stubs for 32-bit systems

2023-02-15 Thread Christophe Leroy


Le 16/02/2023 à 06:09, Rohan McLure a écrit :
> KCSAN instruments calls to atomic builtins, and will in turn call these
> builtins itself. As such, architectures supporting KCSAN must have
> compiler support for these atomic primitives.
> 
> Since 32-bit systems are unlikely to have 64-bit compiler builtins,
> provide a stub for each missing builtin, and use BUG() to assert
> unreachability.
> 
> In commit 725aea873261 ("xtensa: enable KCSAN"), xtensa implements these
> locally. Move these definitions to be accessible to all 32-bit
> architectures that do not provide the necessary builtins, with opt in
> for PowerPC and xtensa.
> 
> Signed-off-by: Rohan McLure 
> Reviewed-by: Max Filippov 

This series should also be addressed to KCSAN Maintainers, shouldn't it ?

KCSAN
M:  Marco Elver 
R:  Dmitry Vyukov 
L:  kasan-...@googlegroups.com
S:  Maintained
F:  Documentation/dev-tools/kcsan.rst
F:  include/linux/kcsan*.h
F:  kernel/kcsan/
F:  lib/Kconfig.kcsan
F:  scripts/Makefile.kcsan


> ---
> Previously issued as a part of a patch series adding KCSAN support to
> 64-bit.
> Link: 
> https://lore.kernel.org/linuxppc-dev/167646486000.1421441.10070059569986228558.b4...@ellerman.id.au/T/#t
> v1: Remove __has_builtin check, as gcc is not obligated to inline
> builtins detected using this check, but instead is permitted to supply
> them in libatomic:
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108734
> Instead, opt-in PPC32 and xtensa.
> ---
>   arch/xtensa/lib/Makefile  | 1 -
>   kernel/kcsan/Makefile | 2 ++
>   arch/xtensa/lib/kcsan-stubs.c => kernel/kcsan/stubs.c | 0
>   3 files changed, 2 insertions(+), 1 deletion(-)
>   rename arch/xtensa/lib/kcsan-stubs.c => kernel/kcsan/stubs.c (100%)
> 
> diff --git a/arch/xtensa/lib/Makefile b/arch/xtensa/lib/Makefile
> index 7ecef0519a27..d69356dc97df 100644
> --- a/arch/xtensa/lib/Makefile
> +++ b/arch/xtensa/lib/Makefile
> @@ -8,5 +8,4 @@ lib-y += memcopy.o memset.o checksum.o \
>  divsi3.o udivsi3.o modsi3.o umodsi3.o mulsi3.o umulsidi3.o \
>  usercopy.o strncpy_user.o strnlen_user.o
>   lib-$(CONFIG_PCI) += pci-auto.o
> -lib-$(CONFIG_KCSAN) += kcsan-stubs.o
>   KCSAN_SANITIZE_kcsan-stubs.o := n
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> index 8cf70f068d92..86dd713d8855 100644
> --- a/kernel/kcsan/Makefile
> +++ b/kernel/kcsan/Makefile
> @@ -12,6 +12,8 @@ CFLAGS_core.o := $(call cc-option,-fno-conserve-stack) \
>   -fno-stack-protector -DDISABLE_BRANCH_PROFILING
>   
>   obj-y := core.o debugfs.o report.o
> +obj-$(CONFIG_PPC32) += stubs.o
> +obj-$(CONFIG_XTENSA) += stubs.o

Not sure it is acceptable to do it that way.

There should likely be something like a CONFIG_ARCH_WANTS_KCSAN_STUBS in 
KCSAN's Kconfig then PPC32 and XTENSA should select it.

>   
>   KCSAN_INSTRUMENT_BARRIERS_selftest.o := y
>   obj-$(CONFIG_KCSAN_SELFTEST) += selftest.o
> diff --git a/arch/xtensa/lib/kcsan-stubs.c b/kernel/kcsan/stubs.c
> similarity index 100%
> rename from arch/xtensa/lib/kcsan-stubs.c
> rename to kernel/kcsan/stubs.c


[PATCH] powerpc/pseries: Fix endianness issue when parsing PLPKS secvar flags

2023-02-15 Thread Andrew Donnellan
When a user updates a variable through the PLPKS secvar interface, we take
the first 8 bytes of the data written to the update attribute to pass
through to the H_PKS_SIGNED_UPDATE hcall as flags. These bytes are always
written in big-endian format.

Currently, the flags bytes are memcpy()ed into a u64, which is then loaded
into a register to pass as part of the hcall. This means that on LE
systems, the bytes are in the wrong order.

Use be64_to_cpup() instead, to ensure the flags bytes are byteswapped if
necessary.

Reported-by: Stefan Berger 
Fixes: ccadf154cb00 ("powerpc/pseries: Implement secvars for dynamic secure 
boot")
Signed-off-by: Andrew Donnellan 
---
 arch/powerpc/platforms/pseries/plpks-secvar.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/plpks-secvar.c 
b/arch/powerpc/platforms/pseries/plpks-secvar.c
index f6cf4076..257fd1f8bc19 100644
--- a/arch/powerpc/platforms/pseries/plpks-secvar.c
+++ b/arch/powerpc/platforms/pseries/plpks-secvar.c
@@ -135,7 +135,8 @@ static int plpks_set_variable(const char *key, u64 key_len, 
u8 *data,
goto err;
var.namelen = rc * 2;
 
-   memcpy(, data, sizeof(flags));
+   // Flags are contained in the first 8 bytes of the buffer, and are 
always big-endian
+   flags = be64_to_cpup((__be64 *)data);
 
var.datalen = data_size - sizeof(flags);
var.data = data + sizeof(flags);
-- 
2.39.1



[powerpc:next] BUILD SUCCESS b0ae5b6f3c298a005b73556740526c0e24a5633c

2023-02-15 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: b0ae5b6f3c298a005b73556740526c0e24a5633c  powerpc/kexec_file: 
print error string on usable memory property update failure

elapsed time: 1071m

configs tested: 85
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
alphaallyesconfig
alpha   defconfig
arc  allyesconfig
arc defconfig
arc  randconfig-r043-20230212
arc  randconfig-r043-20230213
arc  randconfig-r043-20230214
arm  allmodconfig
arm  allyesconfig
arm defconfig
arm  randconfig-r046-20230212
arm  randconfig-r046-20230214
arm64allyesconfig
arm64   defconfig
cskydefconfig
i386 allyesconfig
i386  debian-10.3
i386defconfig
i386 randconfig-a011-20230213
i386 randconfig-a012-20230213
i386 randconfig-a013-20230213
i386 randconfig-a014-20230213
i386 randconfig-a015-20230213
i386 randconfig-a016-20230213
ia64 allmodconfig
ia64defconfig
loongarchallmodconfig
loongarch allnoconfig
loongarch   defconfig
m68k allmodconfig
m68kdefconfig
mips allmodconfig
mips allyesconfig
nios2   defconfig
parisc  defconfig
parisc64defconfig
powerpc  allmodconfig
powerpc   allnoconfig
riscvallmodconfig
riscv allnoconfig
riscv   defconfig
riscvrandconfig-r042-20230213
riscv  rv32_defconfig
s390 allmodconfig
s390 allyesconfig
s390defconfig
s390 randconfig-r044-20230213
sh   allmodconfig
sparc   defconfig
um i386_defconfig
um   x86_64_defconfig
x86_64allnoconfig
x86_64   allyesconfig
x86_64  defconfig
x86_64  kexec
x86_64   randconfig-a011-20230213
x86_64   randconfig-a012-20230213
x86_64   randconfig-a013-20230213
x86_64   randconfig-a014-20230213
x86_64   randconfig-a015-20230213
x86_64   randconfig-a016-20230213
x86_64   rhel-8.3

clang tested configs:
arm  randconfig-r046-20230213
hexagon  randconfig-r041-20230212
hexagon  randconfig-r041-20230213
hexagon  randconfig-r041-20230214
hexagon  randconfig-r045-20230212
hexagon  randconfig-r045-20230213
hexagon  randconfig-r045-20230214
i386 randconfig-a001-20230213
i386 randconfig-a002-20230213
i386 randconfig-a003-20230213
i386 randconfig-a004-20230213
i386 randconfig-a005-20230213
i386 randconfig-a006-20230213
riscvrandconfig-r042-20230212
riscvrandconfig-r042-20230214
s390 randconfig-r044-20230212
s390 randconfig-r044-20230214
x86_64   randconfig-a001-20230213
x86_64   randconfig-a002-20230213
x86_64   randconfig-a003-20230213
x86_64   randconfig-a004-20230213
x86_64   randconfig-a005-20230213
x86_64   randconfig-a006-20230213

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests


Re: [PATCH v4 06/10] soc: fsl: cmp1: Add support for QMC

2023-02-15 Thread Christophe Leroy


Le 15/02/2023 à 23:44, Leo Li a écrit :
> 
> 
>> -Original Message-
>> From: Herve Codina 
>> Sent: Thursday, January 26, 2023 2:32 AM
>> To: Herve Codina ; Leo Li
>> ; Rob Herring ; Krzysztof
>> Kozlowski ; Liam Girdwood
>> ; Mark Brown ; Christophe
>> Leroy ; Michael Ellerman
>> ; Nicholas Piggin ; Qiang Zhao
>> ; Jaroslav Kysela ; Takashi Iwai
>> ; Shengjiu Wang ; Xiubo Li
>> ; Fabio Estevam ; Nicolin
>> Chen 
>> Cc: linuxppc-dev@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org;
>> devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; alsa-devel@alsa-
>> project.org; Thomas Petazzoni 
>> Subject: [PATCH v4 06/10] soc: fsl: cmp1: Add support for QMC
> 
> Typo: cpm1
> 
>>
>> The QMC (QUICC Multichannel Controller) emulates up to 64 channels within
>> one serial controller using the same TDM physical interface routed from the
>> TSA.
>>
>> It is available in some  PowerQUICC SoC such as the
>> MPC885 or MPC866.
>>
>> It is also available on some Quicc Engine SoCs.
>> This current version support CPM1 SoCs only and some enhancement are
>> needed to support Quicc Engine SoCs.
>>
>> Signed-off-by: Herve Codina 
> 
> Otherwise looks good to me.
> 
> Acked-by: Li Yang 

Thanks for the review and the ack.

Were you also able to have a look at patch 2 which implements support 
for the timeslot assigner (TSA) ?

Christophe


Re: 6.2-rc7 fails building on Talos II: memory.c:(.text+0x2e14): undefined reference to `hash__tlb_flush'

2023-02-15 Thread Christophe Leroy


Le 16/02/2023 à 00:55, Erhard F. a écrit :
> Just noticed a build failure on 6.2-rc7 for my Talos 2 (.config attached):
> 
>   # make
>CALLscripts/checksyscalls.sh
>UPD include/generated/utsversion.h
>CC  init/version-timestamp.o
>LD  .tmp_vmlinux.kallsyms1
> ld: ld: DWARF error: could not find abbrev number 6
> mm/memory.o: in function `unmap_page_range':
> memory.c:(.text+0x2e14): undefined reference to `hash__tlb_flush'
> ld: memory.c:(.text+0x2f8c): undefined reference to `hash__tlb_flush'
> ld: ld: DWARF error: could not find abbrev number 3117
> mm/mmu_gather.o: in function `tlb_remove_table':
> mmu_gather.c:(.text+0x584): undefined reference to `hash__tlb_flush'
> ld: mmu_gather.c:(.text+0x6c4): undefined reference to `hash__tlb_flush'
> ld: mm/mmu_gather.o: in function `tlb_flush_mmu':
> mmu_gather.c:(.text+0x80c): undefined reference to `hash__tlb_flush'
> ld: mm/mmu_gather.o:mmu_gather.c:(.text+0xbe0): more undefined references to 
> `hash__tlb_flush' follow
> make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Fehler 1
> make: *** [Makefile:1264: vmlinux] Error 2
> 
> As 6.2-rc6 was good on this machine I did a quick bisect which revealed this 
> commit:
> 
>   # git bisect bad
> 1665c027afb225882a5a0b014c45e84290b826c2 is the first bad commit
> commit 1665c027afb225882a5a0b014c45e84290b826c2
> Author: Michael Ellerman 
> Date:   Tue Jan 31 22:14:07 2023 +1100
> 
>  powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()
>  
>  Commit baf1ed24b27d ("powerpc/mm: Remove empty hash__ functions")
>  removed some empty hash MMU flushing routines, but got a bit overeager
>  and also removed the call to hash__tlb_flush() from tlb_flush().
>  
>  In regular use this doesn't lead to any noticable breakage, which is a
>  little concerning. Presumably there are flushes happening via other
>  paths such as arch_leave_lazy_mmu_mode(), and/or a bit of luck.
>  
>  Fix it by reinstating the call to hash__tlb_flush().
>  
>  Fixes: baf1ed24b27d ("powerpc/mm: Remove empty hash__ functions")
>  Signed-off-by: Michael Ellerman 
>  Link: 
> https://lore.kernel.org/r/2023013407.806770-1-...@ellerman.id.au
> 
>   arch/powerpc/include/asm/book3s/64/tlbflush.h | 2 ++
>   1 file changed, 2 insertions(+)
> 

Can you try with :

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index d5cd16270c5d..2bbc0fcce04a 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -97,8 +97,8 @@ static inline void tlb_flush(struct mmu_gather *tlb)
  {
if (radix_enabled())
radix__tlb_flush(tlb);
-
-   return hash__tlb_flush(tlb);
+   else
+   hash__tlb_flush(tlb);
  }

  #ifdef CONFIG_SMP



[PATCH] powerpc/perf: Add json metric events to present CPI stall cycles in powerpc

2023-02-15 Thread Athira Rajeev
Power10 Performance Monitoring Unit (PMU) provides events
to understand stall cycles of different pipeline stages.
These events along with completed instructions provides
useful metrics for application tuning.

Patch implements the json changes to collect counter statistics
to present the high level CPI stall breakdown metrics. New metric
group is named as "CPI_STALL_RATIO" and this new metric group
presents these stall metrics:
- DISPATCHED_CPI ( Dispatch stall cycles per insn )
- ISSUE_STALL_CPI ( Issue stall cycles per insn )
- EXECUTION_STALL_CPI ( Execution stall cycles per insn )
- COMPLETION_STALL_CPI ( Completition stall cycles per insn )

To avoid multipling of events, PM_RUN_INST_CMPL event has been
modified to use PMC5(performance monitoring counter5) instead
of PMC4. This change is needed, since completion stall event
is using PMC4.

Usage example:

 ./perf stat --metric-no-group -M CPI_STALL_RATIO 

 Performance counter stats for 'workload':

63,056,817,982  PM_CMPL_STALL# 0.28 
COMPLETION_STALL_CPI
 1,743,988,038,896  PM_ISSUE_STALL   # 7.73 
ISSUE_STALL_CPI
   225,597,495,030  PM_RUN_INST_CMPL # 6.18 
DISPATCHED_CPI
  #37.48 EXECUTION_STALL_CPI
 1,393,916,546,654  PM_DISP_STALL_CYC
 8,455,376,836,463  PM_EXEC_STALL

"--metric-no-group" is used for forcing PM_RUN_INST_CMPL to be scheduled
in all group for more accuracy.

Signed-off-by: Athira Rajeev 
---
 tools/perf/pmu-events/arch/powerpc/power10/metrics.json | 8 
 tools/perf/pmu-events/arch/powerpc/power10/others.json  | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json 
b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json
index b57526fa44f2..6f53583a0c62 100644
--- a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json
+++ b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json
@@ -15,7 +15,7 @@
 {
 "BriefDescription": "Average cycles per completed instruction when 
dispatch was stalled for any reason",
 "MetricExpr": "PM_DISP_STALL_CYC / PM_RUN_INST_CMPL",
-"MetricGroup": "CPI",
+"MetricGroup": "CPI;CPI_STALL_RATIO",
 "MetricName": "DISPATCHED_CPI"
 },
 {
@@ -147,13 +147,13 @@
 {
 "BriefDescription": "Average cycles per completed instruction when the 
NTC instruction has been dispatched but not issued for any reason",
 "MetricExpr": "PM_ISSUE_STALL / PM_RUN_INST_CMPL",
-"MetricGroup": "CPI",
+"MetricGroup": "CPI;CPI_STALL_RATIO",
 "MetricName": "ISSUE_STALL_CPI"
 },
 {
 "BriefDescription": "Average cycles per completed instruction when the 
NTC instruction is waiting to be finished in one of the execution units",
 "MetricExpr": "PM_EXEC_STALL / PM_RUN_INST_CMPL",
-"MetricGroup": "CPI",
+"MetricGroup": "CPI;CPI_STALL_RATIO",
 "MetricName": "EXECUTION_STALL_CPI"
 },
 {
@@ -309,7 +309,7 @@
 {
 "BriefDescription": "Average cycles per completed instruction when the 
NTC instruction cannot complete because the thread was blocked",
 "MetricExpr": "PM_CMPL_STALL / PM_RUN_INST_CMPL",
-"MetricGroup": "CPI",
+"MetricGroup": "CPI;CPI_STALL_RATIO",
 "MetricName": "COMPLETION_STALL_CPI"
 },
 {
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json 
b/tools/perf/pmu-events/arch/powerpc/power10/others.json
index 7d0de1a2860b..a771e4b6bec5 100644
--- a/tools/perf/pmu-events/arch/powerpc/power10/others.json
+++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json
@@ -265,7 +265,7 @@
 "BriefDescription": "Load Missed L1, counted at finish time."
   },
   {
-"EventCode": "0x400FA",
+"EventCode": "0x500FA",
 "EventName": "PM_RUN_INST_CMPL",
 "BriefDescription": "Completed PowerPC instructions gated by the run 
latch."
   }
-- 
2.31.1



Re: 6.2-rc7 fails building on Talos II: memory.c:(.text+0x2e14): undefined reference to `hash__tlb_flush'

2023-02-15 Thread Benjamin Gray
On Thu, 2023-02-16 at 00:55 +0100, Erhard F. wrote:
> Just noticed a build failure on 6.2-rc7 for my Talos 2 (.config
> attached):
> 
>  # make
>   CALL    scripts/checksyscalls.sh
>   UPD include/generated/utsversion.h
>   CC  init/version-timestamp.o
>   LD  .tmp_vmlinux.kallsyms1
> ld: ld: DWARF error: could not find abbrev number 6
> mm/memory.o: in function `unmap_page_range':
> memory.c:(.text+0x2e14): undefined reference to `hash__tlb_flush'
> ld: memory.c:(.text+0x2f8c): undefined reference to `hash__tlb_flush'
> ld: ld: DWARF error: could not find abbrev number 3117
> mm/mmu_gather.o: in function `tlb_remove_table':
> mmu_gather.c:(.text+0x584): undefined reference to `hash__tlb_flush'
> ld: mmu_gather.c:(.text+0x6c4): undefined reference to
> `hash__tlb_flush'
> ld: mm/mmu_gather.o: in function `tlb_flush_mmu':
> mmu_gather.c:(.text+0x80c): undefined reference to `hash__tlb_flush'
> ld: mm/mmu_gather.o:mmu_gather.c:(.text+0xbe0): more undefined
> references to `hash__tlb_flush' follow
> make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Fehler 1
> make: *** [Makefile:1264: vmlinux] Error 2
> 
> As 6.2-rc6 was good on this machine I did a quick bisect which
> revealed this commit:
> 
>  # git bisect bad
> 1665c027afb225882a5a0b014c45e84290b826c2 is the first bad commit
> commit 1665c027afb225882a5a0b014c45e84290b826c2
> Author: Michael Ellerman 
> Date:   Tue Jan 31 22:14:07 2023 +1100
> 
>     powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()
>     
>     Commit baf1ed24b27d ("powerpc/mm: Remove empty hash__ functions")
>     removed some empty hash MMU flushing routines, but got a bit
> overeager
>     and also removed the call to hash__tlb_flush() from tlb_flush().
>     
>     In regular use this doesn't lead to any noticable breakage, which
> is a
>     little concerning. Presumably there are flushes happening via
> other
>     paths such as arch_leave_lazy_mmu_mode(), and/or a bit of luck.
>     
>     Fix it by reinstating the call to hash__tlb_flush().
>     
>     Fixes: baf1ed24b27d ("powerpc/mm: Remove empty hash__ functions")
>     Signed-off-by: Michael Ellerman 
>     Link:
> https://lore.kernel.org/r/2023013407.806770-1-...@ellerman.id.au
> 
>  arch/powerpc/include/asm/book3s/64/tlbflush.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> 
> Regards,
> Erhard

Looks like the `return` on the radix version wasn't added back, so it
falls through to the hash call too.



[PATCH v3 35/35] mm: separate vma->lock from vm_area_struct

2023-02-15 Thread Suren Baghdasaryan
vma->lock being part of the vm_area_struct causes performance regression
during page faults because during contention its count and owner fields
are constantly updated and having other parts of vm_area_struct used
during page fault handling next to them causes constant cache line
bouncing. Fix that by moving the lock outside of the vm_area_struct.
All attempts to keep vma->lock inside vm_area_struct in a separate
cache line still produce performance regression especially on NUMA
machines. Smallest regression was achieved when lock is placed in the
fourth cache line but that bloats vm_area_struct to 256 bytes.
Considering performance and memory impact, separate lock looks like
the best option. It increases memory footprint of each VMA but that
can be optimized later if the new size causes issues.
Note that after this change vma_init() does not allocate or
initialize vma->lock anymore. A number of drivers allocate a pseudo
VMA on the stack but they never use the VMA's lock, therefore it does
not need to be allocated. The future drivers which might need the VMA
lock should use vm_area_alloc()/vm_area_free() to allocate the VMA.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h   | 23 ++---
 include/linux/mm_types.h |  6 +++-
 kernel/fork.c| 73 
 3 files changed, 74 insertions(+), 28 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cedef02dfd2b..96b18ef3bfa3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -627,12 +627,6 @@ struct vm_operations_struct {
 };
 
 #ifdef CONFIG_PER_VMA_LOCK
-static inline void vma_init_lock(struct vm_area_struct *vma)
-{
-   init_rwsem(>lock);
-   vma->vm_lock_seq = -1;
-}
-
 /*
  * Try to read-lock a vma. The function is allowed to occasionally yield false
  * locked result to avoid performance overhead, in which case we fall back to
@@ -644,17 +638,17 @@ static inline bool vma_start_read(struct vm_area_struct 
*vma)
if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
return false;
 
-   if (unlikely(down_read_trylock(>lock) == 0))
+   if (unlikely(down_read_trylock(>vm_lock->lock) == 0))
return false;
 
/*
 * Overflow might produce false locked result.
 * False unlocked result is impossible because we modify and check
-* vma->vm_lock_seq under vma->lock protection and mm->mm_lock_seq
+* vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq
 * modification invalidates all existing locks.
 */
if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) {
-   up_read(>lock);
+   up_read(>vm_lock->lock);
return false;
}
return true;
@@ -663,7 +657,7 @@ static inline bool vma_start_read(struct vm_area_struct 
*vma)
 static inline void vma_end_read(struct vm_area_struct *vma)
 {
rcu_read_lock(); /* keeps vma alive till the end of up_read */
-   up_read(>lock);
+   up_read(>vm_lock->lock);
rcu_read_unlock();
 }
 
@@ -681,9 +675,9 @@ static inline void vma_start_write(struct vm_area_struct 
*vma)
if (vma->vm_lock_seq == mm_lock_seq)
return;
 
-   down_write(>lock);
+   down_write(>vm_lock->lock);
vma->vm_lock_seq = mm_lock_seq;
-   up_write(>lock);
+   up_write(>vm_lock->lock);
 }
 
 static inline void vma_assert_write_locked(struct vm_area_struct *vma)
@@ -720,6 +714,10 @@ static inline void vma_mark_detached(struct vm_area_struct 
*vma,
 
 #endif /* CONFIG_PER_VMA_LOCK */
 
+/*
+ * WARNING: vma_init does not initialize vma->vm_lock.
+ * Use vm_area_alloc()/vm_area_free() if vma needs locking.
+ */
 static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
 {
static const struct vm_operations_struct dummy_vm_ops = {};
@@ -729,7 +727,6 @@ static inline void vma_init(struct vm_area_struct *vma, 
struct mm_struct *mm)
vma->vm_ops = _vm_ops;
INIT_LIST_HEAD(>anon_vma_chain);
vma_mark_detached(vma, false);
-   vma_init_lock(vma);
 }
 
 /* Use when VMA is not part of the VMA tree and needs no locking */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 212e7f923a69..30d4f867ae56 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -471,6 +471,10 @@ struct anon_vma_name {
char name[];
 };
 
+struct vma_lock {
+   struct rw_semaphore lock;
+};
+
 /*
  * This struct describes a virtual memory area. There is one of these
  * per VM-area/task. A VM area is any part of the process virtual memory
@@ -510,7 +514,7 @@ struct vm_area_struct {
 
 #ifdef CONFIG_PER_VMA_LOCK
int vm_lock_seq;
-   struct rw_semaphore lock;
+   struct vma_lock *vm_lock;
 
/* Flag to indicate areas detached from the mm->mm_mt tree */
bool detached;
diff --git a/kernel/fork.c b/kernel/fork.c
index 

[PATCH v3 34/35] mm/mmap: free vm_area_struct without call_rcu in exit_mmap

2023-02-15 Thread Suren Baghdasaryan
call_rcu() can take a long time when callback offloading is enabled.
Its use in the vm_area_free can cause regressions in the exit path when
multiple VMAs are being freed.
Because exit_mmap() is called only after the last mm user drops its
refcount, the page fault handlers can't be racing with it. Any other
possible user like oom-reaper or process_mrelease are already synchronized
using mmap_lock. Therefore exit_mmap() can free VMAs directly, without
the use of call_rcu().
Expose __vm_area_free() and use it from exit_mmap() to avoid possible
call_rcu() floods and performance regressions caused by it.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h |  2 ++
 kernel/fork.c  |  2 +-
 mm/mmap.c  | 11 +++
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3c9167529417..cedef02dfd2b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -256,6 +256,8 @@ void setup_initial_init_mm(void *start_code, void *end_code,
 struct vm_area_struct *vm_area_alloc(struct mm_struct *);
 struct vm_area_struct *vm_area_dup(struct vm_area_struct *);
 void vm_area_free(struct vm_area_struct *);
+/* Use only if VMA has no other users */
+void __vm_area_free(struct vm_area_struct *vma);
 
 #ifndef CONFIG_MMU
 extern struct rb_root nommu_region_tree;
diff --git a/kernel/fork.c b/kernel/fork.c
index a08cc0e2bfde..d0999de82f94 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -480,7 +480,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct 
*orig)
return new;
 }
 
-static void __vm_area_free(struct vm_area_struct *vma)
+void __vm_area_free(struct vm_area_struct *vma)
 {
free_anon_vma_name(vma);
kmem_cache_free(vm_area_cachep, vma);
diff --git a/mm/mmap.c b/mm/mmap.c
index adf40177e68f..d847be615720 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -133,7 +133,7 @@ void unlink_file_vma(struct vm_area_struct *vma)
 /*
  * Close a vm structure and free it.
  */
-static void remove_vma(struct vm_area_struct *vma)
+static void remove_vma(struct vm_area_struct *vma, bool unreachable)
 {
might_sleep();
if (vma->vm_ops && vma->vm_ops->close)
@@ -141,7 +141,10 @@ static void remove_vma(struct vm_area_struct *vma)
if (vma->vm_file)
fput(vma->vm_file);
mpol_put(vma_policy(vma));
-   vm_area_free(vma);
+   if (unreachable)
+   __vm_area_free(vma);
+   else
+   vm_area_free(vma);
 }
 
 static inline struct vm_area_struct *vma_prev_limit(struct vma_iterator *vmi,
@@ -2135,7 +2138,7 @@ static inline void remove_mt(struct mm_struct *mm, struct 
ma_state *mas)
if (vma->vm_flags & VM_ACCOUNT)
nr_accounted += nrpages;
vm_stat_account(mm, vma->vm_flags, -nrpages);
-   remove_vma(vma);
+   remove_vma(vma, false);
}
vm_unacct_memory(nr_accounted);
validate_mm(mm);
@@ -3085,7 +3088,7 @@ void exit_mmap(struct mm_struct *mm)
do {
if (vma->vm_flags & VM_ACCOUNT)
nr_accounted += vma_pages(vma);
-   remove_vma(vma);
+   remove_vma(vma, true);
count++;
cond_resched();
} while ((vma = mas_find(, ULONG_MAX)) != NULL);
-- 
2.39.1



[PATCH v3 33/35] powerc/mm: try VMA lock-based page fault handling first

2023-02-15 Thread Suren Baghdasaryan
From: Laurent Dufour 

Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.
Copied from "x86/mm: try VMA lock-based page fault handling first"

Signed-off-by: Laurent Dufour 
Signed-off-by: Suren Baghdasaryan 
---
 arch/powerpc/mm/fault.c| 41 ++
 arch/powerpc/platforms/powernv/Kconfig |  1 +
 arch/powerpc/platforms/pseries/Kconfig |  1 +
 3 files changed, 43 insertions(+)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 2bef19cc1b98..c7ae86b04b8a 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -469,6 +469,44 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned 
long address,
if (is_exec)
flags |= FAULT_FLAG_INSTRUCTION;
 
+#ifdef CONFIG_PER_VMA_LOCK
+   if (!(flags & FAULT_FLAG_USER))
+   goto lock_mmap;
+
+   vma = lock_vma_under_rcu(mm, address);
+   if (!vma)
+   goto lock_mmap;
+
+   if (unlikely(access_pkey_error(is_write, is_exec,
+  (error_code & DSISR_KEYFAULT), vma))) {
+   int rc = bad_access_pkey(regs, address, vma);
+
+   vma_end_read(vma);
+   return rc;
+   }
+
+   if (unlikely(access_error(is_write, is_exec, vma))) {
+   int rc = bad_access(regs, address);
+
+   vma_end_read(vma);
+   return rc;
+   }
+
+   fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
+   vma_end_read(vma);
+
+   if (!(fault & VM_FAULT_RETRY)) {
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   goto done;
+   }
+   count_vm_vma_lock_event(VMA_LOCK_RETRY);
+
+   if (fault_signal_pending(fault, regs))
+   return user_mode(regs) ? 0 : SIGBUS;
+
+lock_mmap:
+#endif /* CONFIG_PER_VMA_LOCK */
+
/* When running in the kernel we expect faults to occur only to
 * addresses in user space.  All other faults represent errors in the
 * kernel and should generate an OOPS.  Unfortunately, in the case of an
@@ -545,6 +583,9 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned 
long address,
 
mmap_read_unlock(current->mm);
 
+#ifdef CONFIG_PER_VMA_LOCK
+done:
+#endif
if (unlikely(fault & VM_FAULT_ERROR))
return mm_fault_error(regs, address, fault);
 
diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index ae248a161b43..70a46acc70d6 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -16,6 +16,7 @@ config PPC_POWERNV
select PPC_DOORBELL
select MMU_NOTIFIER
select FORCE_SMP
+   select ARCH_SUPPORTS_PER_VMA_LOCK
default y
 
 config OPAL_PRD
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index a3b4d99567cb..e036a04ff1ca 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -21,6 +21,7 @@ config PPC_PSERIES
select HOTPLUG_CPU
select FORCE_SMP
select SWIOTLB
+   select ARCH_SUPPORTS_PER_VMA_LOCK
default y
 
 config PARAVIRT
-- 
2.39.1



[PATCH v3 32/35] arm64/mm: try VMA lock-based page fault handling first

2023-02-15 Thread Suren Baghdasaryan
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.

Signed-off-by: Suren Baghdasaryan 
---
 arch/arm64/Kconfig|  1 +
 arch/arm64/mm/fault.c | 36 
 2 files changed, 37 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c5ccca26a408..9f2c0e352da3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -95,6 +95,7 @@ config ARM64
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_SUPPORTS_PAGE_TABLE_CHECK
+   select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
select ARCH_WANT_DEFAULT_BPF_JIT
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index f4cb0f85ccf4..9e0db5c387e3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -535,6 +535,9 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
unsigned long vm_flags;
unsigned int mm_flags = FAULT_FLAG_DEFAULT;
unsigned long addr = untagged_addr(far);
+#ifdef CONFIG_PER_VMA_LOCK
+   struct vm_area_struct *vma;
+#endif
 
if (kprobe_page_fault(regs, esr))
return 0;
@@ -585,6 +588,36 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
 
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
+#ifdef CONFIG_PER_VMA_LOCK
+   if (!(mm_flags & FAULT_FLAG_USER))
+   goto lock_mmap;
+
+   vma = lock_vma_under_rcu(mm, addr);
+   if (!vma)
+   goto lock_mmap;
+
+   if (!(vma->vm_flags & vm_flags)) {
+   vma_end_read(vma);
+   goto lock_mmap;
+   }
+   fault = handle_mm_fault(vma, addr & PAGE_MASK,
+   mm_flags | FAULT_FLAG_VMA_LOCK, regs);
+   vma_end_read(vma);
+
+   if (!(fault & VM_FAULT_RETRY)) {
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   goto done;
+   }
+   count_vm_vma_lock_event(VMA_LOCK_RETRY);
+
+   /* Quick path to respond to signals */
+   if (fault_signal_pending(fault, regs)) {
+   if (!user_mode(regs))
+   goto no_context;
+   return 0;
+   }
+lock_mmap:
+#endif /* CONFIG_PER_VMA_LOCK */
/*
 * As per x86, we may deadlock here. However, since the kernel only
 * validly references user space from well defined areas of the code,
@@ -628,6 +661,9 @@ static int __kprobes do_page_fault(unsigned long far, 
unsigned long esr,
}
mmap_read_unlock(mm);
 
+#ifdef CONFIG_PER_VMA_LOCK
+done:
+#endif
/*
 * Handle the "normal" (no error) case first.
 */
-- 
2.39.1



[PATCH v3 31/35] x86/mm: try VMA lock-based page fault handling first

2023-02-15 Thread Suren Baghdasaryan
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.

Signed-off-by: Suren Baghdasaryan 
---
 arch/x86/Kconfig|  1 +
 arch/x86/mm/fault.c | 36 
 2 files changed, 37 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3604074a878b..3647f7bdb110 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -27,6 +27,7 @@ config X86_64
# Options that are inherently 64-bit kernel only:
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
+   select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_USE_CMPXCHG_LOCKREF
select HAVE_ARCH_SOFT_DIRTY
select MODULES_USE_ELF_RELA
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a498ae1fbe66..e4399983c50c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -19,6 +19,7 @@
 #include  /* faulthandler_disabled()  */
 #include  /* 
efi_crash_gracefully_on_page_fault()*/
 #include 
+#include   /* find_and_lock_vma() */
 
 #include /* boot_cpu_has, ...*/
 #include  /* dotraplinkage, ...   */
@@ -1333,6 +1334,38 @@ void do_user_addr_fault(struct pt_regs *regs,
}
 #endif
 
+#ifdef CONFIG_PER_VMA_LOCK
+   if (!(flags & FAULT_FLAG_USER))
+   goto lock_mmap;
+
+   vma = lock_vma_under_rcu(mm, address);
+   if (!vma)
+   goto lock_mmap;
+
+   if (unlikely(access_error(error_code, vma))) {
+   vma_end_read(vma);
+   goto lock_mmap;
+   }
+   fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, 
regs);
+   vma_end_read(vma);
+
+   if (!(fault & VM_FAULT_RETRY)) {
+   count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+   goto done;
+   }
+   count_vm_vma_lock_event(VMA_LOCK_RETRY);
+
+   /* Quick path to respond to signals */
+   if (fault_signal_pending(fault, regs)) {
+   if (!user_mode(regs))
+   kernelmode_fixup_or_oops(regs, error_code, address,
+SIGBUS, BUS_ADRERR,
+ARCH_DEFAULT_PKEY);
+   return;
+   }
+lock_mmap:
+#endif /* CONFIG_PER_VMA_LOCK */
+
/*
 * Kernel-mode access to the user address space should only occur
 * on well-defined single instructions listed in the exception
@@ -1433,6 +1466,9 @@ void do_user_addr_fault(struct pt_regs *regs,
}
 
mmap_read_unlock(mm);
+#ifdef CONFIG_PER_VMA_LOCK
+done:
+#endif
if (likely(!(fault & VM_FAULT_ERROR)))
return;
 
-- 
2.39.1



[PATCH v3 30/35] mm: introduce per-VMA lock statistics

2023-02-15 Thread Suren Baghdasaryan
Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra
statistics about handling page fault under VMA lock.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/vm_event_item.h | 6 ++
 include/linux/vmstat.h| 6 ++
 mm/Kconfig.debug  | 6 ++
 mm/memory.c   | 2 ++
 mm/vmstat.c   | 6 ++
 5 files changed, 26 insertions(+)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 7f5d1caf5890..8abfa1240040 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -149,6 +149,12 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_X86
DIRECT_MAP_LEVEL2_SPLIT,
DIRECT_MAP_LEVEL3_SPLIT,
+#endif
+#ifdef CONFIG_PER_VMA_LOCK_STATS
+   VMA_LOCK_SUCCESS,
+   VMA_LOCK_ABORT,
+   VMA_LOCK_RETRY,
+   VMA_LOCK_MISS,
 #endif
NR_VM_EVENT_ITEMS
 };
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 19cf5b6892ce..fed855bae6d8 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -125,6 +125,12 @@ static inline void vm_events_fold_cpu(int cpu)
 #define count_vm_tlb_events(x, y) do { (void)(y); } while (0)
 #endif
 
+#ifdef CONFIG_PER_VMA_LOCK_STATS
+#define count_vm_vma_lock_event(x) count_vm_event(x)
+#else
+#define count_vm_vma_lock_event(x) do {} while (0)
+#endif
+
 #define __count_zid_vm_events(item, zid, delta) \
__count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta)
 
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index c3547a373c9c..4965a7333a3f 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -279,3 +279,9 @@ config DEBUG_KMEMLEAK_AUTO_SCAN
 
  If unsure, say Y.
 
+config PER_VMA_LOCK_STATS
+   bool "Statistics for per-vma locks"
+   depends on PER_VMA_LOCK
+   default y
+   help
+ Statistics for per-vma locks.
diff --git a/mm/memory.c b/mm/memory.c
index 751aebc1b29f..94194a45ffa7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5272,6 +5272,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct 
mm_struct *mm,
/* Check if the VMA got isolated after we found it */
if (vma->detached) {
vma_end_read(vma);
+   count_vm_vma_lock_event(VMA_LOCK_MISS);
/* The area was replaced with another one */
goto retry;
}
@@ -5280,6 +5281,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct 
mm_struct *mm,
return vma;
 inval:
rcu_read_unlock();
+   count_vm_vma_lock_event(VMA_LOCK_ABORT);
return NULL;
 }
 #endif /* CONFIG_PER_VMA_LOCK */
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1ea6a5ce1c41..4f1089a1860e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1399,6 +1399,12 @@ const char * const vmstat_text[] = {
"direct_map_level2_splits",
"direct_map_level3_splits",
 #endif
+#ifdef CONFIG_PER_VMA_LOCK_STATS
+   "vma_lock_success",
+   "vma_lock_abort",
+   "vma_lock_retry",
+   "vma_lock_miss",
+#endif
 #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */
 };
 #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */
-- 
2.39.1



[PATCH v3 29/35] mm: prevent userfaults to be handled under per-vma lock

2023-02-15 Thread Suren Baghdasaryan
Due to the possibility of handle_userfault dropping mmap_lock, avoid fault
handling under VMA lock and retry holding mmap_lock. This can be handled
more gracefully in the future.

Signed-off-by: Suren Baghdasaryan 
Suggested-by: Peter Xu 
---
 mm/memory.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 555612d153ad..751aebc1b29f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5254,6 +5254,15 @@ struct vm_area_struct *lock_vma_under_rcu(struct 
mm_struct *mm,
if (!vma_start_read(vma))
goto inval;
 
+   /*
+* Due to the possibility of userfault handler dropping mmap_lock, avoid
+* it for now and fall back to page fault handling under mmap_lock.
+*/
+   if (userfaultfd_armed(vma)) {
+   vma_end_read(vma);
+   goto inval;
+   }
+
/* Check since vm_start/vm_end might change before we lock the VMA */
if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
vma_end_read(vma);
-- 
2.39.1



[PATCH v3 28/35] mm: prevent do_swap_page from handling page faults under VMA lock

2023-02-15 Thread Suren Baghdasaryan
Due to the possibility of do_swap_page dropping mmap_lock, abort fault
handling under VMA lock and retry holding mmap_lock. This can be handled
more gracefully in the future.

Signed-off-by: Suren Baghdasaryan 
Reviewed-by: Laurent Dufour 
---
 mm/memory.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 13369ff15ec1..555612d153ad 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3688,6 +3688,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
if (!pte_unmap_same(vmf))
goto out;
 
+   if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
+   ret = VM_FAULT_RETRY;
+   goto out;
+   }
+
entry = pte_to_swp_entry(vmf->orig_pte);
if (unlikely(non_swap_entry(entry))) {
if (is_migration_entry(entry)) {
-- 
2.39.1



[PATCH v3 27/35] mm: add FAULT_FLAG_VMA_LOCK flag

2023-02-15 Thread Suren Baghdasaryan
Add a new flag to distinguish page faults handled under protection of
per-vma lock.

Signed-off-by: Suren Baghdasaryan 
Reviewed-by: Laurent Dufour 
---
 include/linux/mm.h   | 3 ++-
 include/linux/mm_types.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 36172bb38e6b..3c9167529417 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -478,7 +478,8 @@ static inline bool fault_flag_allow_retry_first(enum 
fault_flag flags)
{ FAULT_FLAG_USER,  "USER" }, \
{ FAULT_FLAG_REMOTE,"REMOTE" }, \
{ FAULT_FLAG_INSTRUCTION,   "INSTRUCTION" }, \
-   { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }
+   { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \
+   { FAULT_FLAG_VMA_LOCK,  "VMA_LOCK" }
 
 /*
  * vm_fault is filled by the pagefault handler and passed to the vma's
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 939f4f5a1115..212e7f923a69 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1056,6 +1056,7 @@ enum fault_flag {
FAULT_FLAG_INTERRUPTIBLE =  1 << 9,
FAULT_FLAG_UNSHARE =1 << 10,
FAULT_FLAG_ORIG_PTE_VALID = 1 << 11,
+   FAULT_FLAG_VMA_LOCK =   1 << 12,
 };
 
 typedef unsigned int __bitwise zap_flags_t;
-- 
2.39.1



[PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set

2023-02-15 Thread Suren Baghdasaryan
When vma->anon_vma is not set, page fault handler will set it by either
reusing anon_vma of an adjacent VMA if VMAs are compatible or by
allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
a compatible adjacent VMA and that requires not only the faulting VMA
to be stable but also the tree structure and other VMAs inside that tree.
Therefore locking just the faulting VMA is not enough for this search.
Fall back to taking mmap_lock when vma->anon_vma is not set. This
situation happens only on the first page fault and should not affect
overall performance.

Signed-off-by: Suren Baghdasaryan 
---
 mm/memory.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 5e1c124552a1..13369ff15ec1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5242,6 +5242,10 @@ struct vm_area_struct *lock_vma_under_rcu(struct 
mm_struct *mm,
if (!vma_is_anonymous(vma))
goto inval;
 
+   /* find_mergeable_anon_vma uses adjacent vmas which are not locked */
+   if (!vma->anon_vma)
+   goto inval;
+
if (!vma_start_read(vma))
goto inval;
 
-- 
2.39.1



[PATCH v3 25/35] mm: introduce lock_vma_under_rcu to be used from arch-specific code

2023-02-15 Thread Suren Baghdasaryan
Introduce lock_vma_under_rcu function to lookup and lock a VMA during
page fault handling. When VMA is not found, can't be locked or changes
after being locked, the function returns NULL. The lookup is performed
under RCU protection to prevent the found VMA from being destroyed before
the VMA lock is acquired. VMA lock statistics are updated according to
the results.
For now only anonymous VMAs can be searched this way. In other cases the
function returns NULL.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h |  3 +++
 mm/memory.c| 46 ++
 2 files changed, 49 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3f98344f829c..36172bb38e6b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -701,6 +701,9 @@ static inline void vma_mark_detached(struct vm_area_struct 
*vma, bool detached)
vma->detached = detached;
 }
 
+struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
+ unsigned long address);
+
 #else /* CONFIG_PER_VMA_LOCK */
 
 static inline void vma_init_lock(struct vm_area_struct *vma) {}
diff --git a/mm/memory.c b/mm/memory.c
index 8177c59ffd2d..5e1c124552a1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5220,6 +5220,52 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, 
unsigned long address,
 }
 EXPORT_SYMBOL_GPL(handle_mm_fault);
 
+#ifdef CONFIG_PER_VMA_LOCK
+/*
+ * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be
+ * stable and not isolated. If the VMA is not found or is being modified the
+ * function returns NULL.
+ */
+struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
+ unsigned long address)
+{
+   MA_STATE(mas, >mm_mt, address, address);
+   struct vm_area_struct *vma;
+
+   rcu_read_lock();
+retry:
+   vma = mas_walk();
+   if (!vma)
+   goto inval;
+
+   /* Only anonymous vmas are supported for now */
+   if (!vma_is_anonymous(vma))
+   goto inval;
+
+   if (!vma_start_read(vma))
+   goto inval;
+
+   /* Check since vm_start/vm_end might change before we lock the VMA */
+   if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
+   vma_end_read(vma);
+   goto inval;
+   }
+
+   /* Check if the VMA got isolated after we found it */
+   if (vma->detached) {
+   vma_end_read(vma);
+   /* The area was replaced with another one */
+   goto retry;
+   }
+
+   rcu_read_unlock();
+   return vma;
+inval:
+   rcu_read_unlock();
+   return NULL;
+}
+#endif /* CONFIG_PER_VMA_LOCK */
+
 #ifndef __PAGETABLE_P4D_FOLDED
 /*
  * Allocate p4d page table.
-- 
2.39.1



[PATCH v3 24/35] mm: introduce vma detached flag

2023-02-15 Thread Suren Baghdasaryan
Per-vma locking mechanism will search for VMA under RCU protection and
then after locking it, has to ensure it was not removed from the VMA
tree after we found it. To make this check efficient, introduce a
vma->detached flag to mark VMAs which were removed from the VMA tree.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h   | 11 +++
 include/linux/mm_types.h |  3 +++
 mm/mmap.c|  2 ++
 3 files changed, 16 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f4f702224ec5..3f98344f829c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -693,6 +693,14 @@ static inline void vma_assert_write_locked(struct 
vm_area_struct *vma)
VM_BUG_ON_VMA(vma->vm_lock_seq != READ_ONCE(vma->vm_mm->mm_lock_seq), 
vma);
 }
 
+static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached)
+{
+   /* When detaching vma should be write-locked */
+   if (detached)
+   vma_assert_write_locked(vma);
+   vma->detached = detached;
+}
+
 #else /* CONFIG_PER_VMA_LOCK */
 
 static inline void vma_init_lock(struct vm_area_struct *vma) {}
@@ -701,6 +709,8 @@ static inline bool vma_start_read(struct vm_area_struct 
*vma)
 static inline void vma_end_read(struct vm_area_struct *vma) {}
 static inline void vma_start_write(struct vm_area_struct *vma) {}
 static inline void vma_assert_write_locked(struct vm_area_struct *vma) {}
+static inline void vma_mark_detached(struct vm_area_struct *vma,
+bool detached) {}
 
 #endif /* CONFIG_PER_VMA_LOCK */
 
@@ -712,6 +722,7 @@ static inline void vma_init(struct vm_area_struct *vma, 
struct mm_struct *mm)
vma->vm_mm = mm;
vma->vm_ops = _vm_ops;
INIT_LIST_HEAD(>anon_vma_chain);
+   vma_mark_detached(vma, false);
vma_init_lock(vma);
 }
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e268723eaf44..939f4f5a1115 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -511,6 +511,9 @@ struct vm_area_struct {
 #ifdef CONFIG_PER_VMA_LOCK
int vm_lock_seq;
struct rw_semaphore lock;
+
+   /* Flag to indicate areas detached from the mm->mm_mt tree */
+   bool detached;
 #endif
 
/*
diff --git a/mm/mmap.c b/mm/mmap.c
index 801608726be8..adf40177e68f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -593,6 +593,7 @@ static inline void vma_complete(struct vma_prepare *vp,
 
if (vp->remove) {
 again:
+   vma_mark_detached(vp->remove, true);
if (vp->file) {
uprobe_munmap(vp->remove, vp->remove->vm_start,
  vp->remove->vm_end);
@@ -2267,6 +2268,7 @@ static inline int munmap_sidetree(struct vm_area_struct 
*vma,
if (mas_store_gfp(mas_detach, vma, GFP_KERNEL))
return -ENOMEM;
 
+   vma_mark_detached(vma, true);
if (vma->vm_flags & VM_LOCKED)
vma->vm_mm->locked_vm -= vma_pages(vma);
 
-- 
2.39.1



[PATCH v3 23/35] mm/mmap: prevent pagefault handler from racing with mmu_notifier registration

2023-02-15 Thread Suren Baghdasaryan
Page fault handlers might need to fire MMU notifications while a new
notifier is being registered. Modify mm_take_all_locks to write-lock all
VMAs and prevent this race with page fault handlers that would hold VMA
locks. VMAs are locked before i_mmap_rwsem and anon_vma to keep the same
locking order as in page fault handlers.

Signed-off-by: Suren Baghdasaryan 
---
 mm/mmap.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index 00f8c5798936..801608726be8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3501,6 +3501,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct 
address_space *mapping)
  * of mm/rmap.c:
  *   - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for
  * hugetlb mapping);
+ *   - all vmas marked locked
  *   - all i_mmap_rwsem locks;
  *   - all anon_vma->rwseml
  *
@@ -3523,6 +3524,13 @@ int mm_take_all_locks(struct mm_struct *mm)
 
mutex_lock(_all_locks_mutex);
 
+   mas_for_each(, vma, ULONG_MAX) {
+   if (signal_pending(current))
+   goto out_unlock;
+   vma_start_write(vma);
+   }
+
+   mas_set(, 0);
mas_for_each(, vma, ULONG_MAX) {
if (signal_pending(current))
goto out_unlock;
@@ -3612,6 +3620,7 @@ void mm_drop_all_locks(struct mm_struct *mm)
if (vma->vm_file && vma->vm_file->f_mapping)
vm_unlock_mapping(vma->vm_file->f_mapping);
}
+   vma_end_write_all(mm);
 
mutex_unlock(_all_locks_mutex);
 }
-- 
2.39.1



[PATCH v3 22/35] kernel/fork: assert no VMA readers during its destruction

2023-02-15 Thread Suren Baghdasaryan
Assert there are no holders of VMA lock for reading when it is about to be
destroyed.

Signed-off-by: Suren Baghdasaryan 
---
 kernel/fork.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/fork.c b/kernel/fork.c
index 9141427a98b2..a08cc0e2bfde 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -491,6 +491,9 @@ static void vm_area_free_rcu_cb(struct rcu_head *head)
 {
struct vm_area_struct *vma = container_of(head, struct vm_area_struct,
  vm_rcu);
+
+   /* The vma should not be locked while being destroyed. */
+   VM_BUG_ON_VMA(rwsem_is_locked(>lock), vma);
__vm_area_free(vma);
 }
 #endif
-- 
2.39.1



[PATCH v3 21/35] mm/mmap: write-lock adjacent VMAs if they can grow into unmapped area

2023-02-15 Thread Suren Baghdasaryan
While unmapping VMAs, adjacent VMAs might be able to grow into the area
being unmapped. In such cases write-lock adjacent VMAs to prevent this
growth.

Signed-off-by: Suren Baghdasaryan 
---
 mm/mmap.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 118b2246bba9..00f8c5798936 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2399,11 +2399,13 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct 
vm_area_struct *vma,
 * down_read(mmap_lock) and collide with the VMA we are about to unmap.
 */
if (downgrade) {
-   if (next && (next->vm_flags & VM_GROWSDOWN))
+   if (next && (next->vm_flags & VM_GROWSDOWN)) {
+   vma_start_write(next);
downgrade = false;
-   else if (prev && (prev->vm_flags & VM_GROWSUP))
+   } else if (prev && (prev->vm_flags & VM_GROWSUP)) {
+   vma_start_write(prev);
downgrade = false;
-   else
+   } else
mmap_write_downgrade(mm);
}
 
-- 
2.39.1



[PATCH v3 20/35] mm: conditionally write-lock VMA in free_pgtables

2023-02-15 Thread Suren Baghdasaryan
Normally free_pgtables needs to lock affected VMAs except for the case
when VMAs were isolated under VMA write-lock. munmap() does just that,
isolating while holding appropriate locks and then downgrading mmap_lock
and dropping per-VMA locks before freeing page tables.
Add a parameter to free_pgtables for such scenario.

Signed-off-by: Suren Baghdasaryan 
---
 mm/internal.h | 2 +-
 mm/memory.c   | 6 +-
 mm/mmap.c | 5 +++--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index fc01fd092ea5..400c302fbf47 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -105,7 +105,7 @@ void folio_activate(struct folio *folio);
 
 void free_pgtables(struct mmu_gather *tlb, struct maple_tree *mt,
   struct vm_area_struct *start_vma, unsigned long floor,
-  unsigned long ceiling);
+  unsigned long ceiling, bool mm_wr_locked);
 void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte);
 
 struct zap_details;
diff --git a/mm/memory.c b/mm/memory.c
index f456f3b5049c..8177c59ffd2d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -348,7 +348,7 @@ void free_pgd_range(struct mmu_gather *tlb,
 
 void free_pgtables(struct mmu_gather *tlb, struct maple_tree *mt,
   struct vm_area_struct *vma, unsigned long floor,
-  unsigned long ceiling)
+  unsigned long ceiling, bool mm_wr_locked)
 {
MA_STATE(mas, mt, vma->vm_end, vma->vm_end);
 
@@ -366,6 +366,8 @@ void free_pgtables(struct mmu_gather *tlb, struct 
maple_tree *mt,
 * Hide vma from rmap and truncate_pagecache before freeing
 * pgtables
 */
+   if (mm_wr_locked)
+   vma_start_write(vma);
unlink_anon_vmas(vma);
unlink_file_vma(vma);
 
@@ -380,6 +382,8 @@ void free_pgtables(struct mmu_gather *tlb, struct 
maple_tree *mt,
   && !is_vm_hugetlb_page(next)) {
vma = next;
next = mas_find(, ceiling - 1);
+   if (mm_wr_locked)
+   vma_start_write(vma);
unlink_anon_vmas(vma);
unlink_file_vma(vma);
}
diff --git a/mm/mmap.c b/mm/mmap.c
index aaa359a147b2..118b2246bba9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2157,7 +2157,8 @@ static void unmap_region(struct mm_struct *mm, struct 
maple_tree *mt,
update_hiwater_rss(mm);
unmap_vmas(, mt, vma, start, end, mm_wr_locked);
free_pgtables(, mt, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
-next ? next->vm_start : USER_PGTABLES_CEILING);
+next ? next->vm_start : USER_PGTABLES_CEILING,
+mm_wr_locked);
tlb_finish_mmu();
 }
 
@@ -3069,7 +3070,7 @@ void exit_mmap(struct mm_struct *mm)
mmap_write_lock(mm);
mt_clear_in_rcu(>mm_mt);
free_pgtables(, >mm_mt, vma, FIRST_USER_ADDRESS,
- USER_PGTABLES_CEILING);
+ USER_PGTABLES_CEILING, true);
tlb_finish_mmu();
 
/*
-- 
2.39.1



[PATCH v3 19/35] mm: write-lock VMAs before removing them from VMA tree

2023-02-15 Thread Suren Baghdasaryan
Write-locking VMAs before isolating them ensures that page fault
handlers don't operate on isolated VMAs.

Signed-off-by: Suren Baghdasaryan 
---
 mm/mmap.c  | 1 +
 mm/nommu.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index 0eaa3d1a6cd1..aaa359a147b2 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2261,6 +2261,7 @@ int split_vma(struct vma_iterator *vmi, struct 
vm_area_struct *vma,
 static inline int munmap_sidetree(struct vm_area_struct *vma,
   struct ma_state *mas_detach)
 {
+   vma_start_write(vma);
mas_set_range(mas_detach, vma->vm_start, vma->vm_end - 1);
if (mas_store_gfp(mas_detach, vma, GFP_KERNEL))
return -ENOMEM;
diff --git a/mm/nommu.c b/mm/nommu.c
index 57ba243c6a37..2ab162d773e2 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -588,6 +588,7 @@ static int delete_vma_from_mm(struct vm_area_struct *vma)
   current->pid);
return -ENOMEM;
}
+   vma_start_write(vma);
cleanup_vma_from_mm(vma);
 
/* remove from the MM's tree and list */
@@ -1519,6 +1520,10 @@ void exit_mmap(struct mm_struct *mm)
 */
mmap_write_lock(mm);
for_each_vma(vmi, vma) {
+   /*
+* No need to lock VMA because this is the only mm user and no
+* page fault handled can race with it.
+*/
cleanup_vma_from_mm(vma);
delete_vma(mm, vma);
cond_resched();
-- 
2.39.1



[PATCH v3 18/35] mm/mremap: write-lock VMA while remapping it to a new address range

2023-02-15 Thread Suren Baghdasaryan
Write-lock VMA as locked before copying it and when copy_vma produces
a new VMA.

Signed-off-by: Suren Baghdasaryan 
Reviewed-by: Laurent Dufour 
---
 mm/mmap.c   | 1 +
 mm/mremap.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index f079e5bbcd57..0eaa3d1a6cd1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3202,6 +3202,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct 
**vmap,
get_file(new_vma->vm_file);
if (new_vma->vm_ops && new_vma->vm_ops->open)
new_vma->vm_ops->open(new_vma);
+   vma_start_write(new_vma);
if (vma_link(mm, new_vma))
goto out_vma_link;
*need_rmap_locks = false;
diff --git a/mm/mremap.c b/mm/mremap.c
index 411a85682b58..dd541e59edda 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -623,6 +623,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
return -ENOMEM;
}
 
+   vma_start_write(vma);
new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT);
new_vma = copy_vma(, new_addr, new_len, new_pgoff,
   _rmap_locks);
-- 
2.39.1



[PATCH v3 17/35] mm/mmap: write-lock VMA before shrinking or expanding it

2023-02-15 Thread Suren Baghdasaryan
vma_expand and vma_shrink change VMA boundaries. Expansion might also
result in freeing of an adjacent VMA. Write-lock affected VMAs to prevent
concurrent page faults.

Signed-off-by: Suren Baghdasaryan 
---
 mm/mmap.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index ec2f8d0af280..f079e5bbcd57 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -674,6 +674,9 @@ int vma_expand(struct vma_iterator *vmi, struct 
vm_area_struct *vma,
ret = dup_anon_vma(vma, next);
if (ret)
return ret;
+
+   /* Lock the VMA  before removing it */
+   vma_start_write(next);
}
 
init_multi_vma_prep(, vma, NULL, remove_next ? next : NULL, NULL);
@@ -686,6 +689,7 @@ int vma_expand(struct vma_iterator *vmi, struct 
vm_area_struct *vma,
if (vma_iter_prealloc(vmi))
goto nomem;
 
+   vma_start_write(vma);
vma_adjust_trans_huge(vma, start, end, 0);
/* VMA iterator points to previous, so set to start if necessary */
if (vma_iter_addr(vmi) != start)
@@ -725,6 +729,7 @@ int vma_shrink(struct vma_iterator *vmi, struct 
vm_area_struct *vma,
if (vma_iter_prealloc(vmi))
return -ENOMEM;
 
+   vma_start_write(vma);
init_vma_prep(, vma);
vma_adjust_trans_huge(vma, start, end, 0);
vma_prepare();
-- 
2.39.1



[PATCH v3 16/35] mm/mmap: write-lock VMAs before merging, splitting or expanding them

2023-02-15 Thread Suren Baghdasaryan
Decisions about whether VMAs can be merged, split or expanded must be
made while VMAs are protected from the changes which can affect that
decision. For example, merge_vma uses vma->anon_vma in its decision
whether the VMA can be merged. Meanwhile, page fault handler changes
vma->anon_vma during COW operation.
Write-lock all VMAs which might be affected by a merge or split operation
before making decision how such operations should be performed.

Signed-off-by: Suren Baghdasaryan 
---
 mm/mmap.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index c5f2ddf17b87..ec2f8d0af280 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -269,8 +269,11 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 */
vma_iter_init(, mm, oldbrk);
next = vma_find(, newbrk + PAGE_SIZE + stack_guard_gap);
-   if (next && newbrk + PAGE_SIZE > vm_start_gap(next))
-   goto out;
+   if (next) {
+   vma_start_write(next);
+   if (newbrk + PAGE_SIZE > vm_start_gap(next))
+   goto out;
+   }
 
brkvma = vma_prev_limit(, mm->start_brk);
/* Ok, looks good - let it rip. */
@@ -912,10 +915,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator 
*vmi, struct mm_struct *mm,
if (vm_flags & VM_SPECIAL)
return NULL;
 
+   if (prev)
+   vma_start_write(prev);
next = find_vma(mm, prev ? prev->vm_end : 0);
+   if (next)
+   vma_start_write(next);
mid = next;
-   if (next && next->vm_end == end)/* cases 6, 7, 8 */
+   if (next && next->vm_end == end) {  /* cases 6, 7, 8 */
next = find_vma(mm, next->vm_end);
+   if (next)
+   vma_start_write(next);
+   }
 
/* verify some invariant that must be enforced by the caller */
VM_WARN_ON(prev && addr <= prev->vm_start);
@@ -2163,6 +2173,7 @@ int __split_vma(struct vma_iterator *vmi, struct 
vm_area_struct *vma,
WARN_ON(vma->vm_start >= addr);
WARN_ON(vma->vm_end <= addr);
 
+   vma_start_write(vma);
if (vma->vm_ops && vma->vm_ops->may_split) {
err = vma->vm_ops->may_split(vma, addr);
if (err)
@@ -2518,6 +2529,8 @@ unsigned long mmap_region(struct file *file, unsigned 
long addr,
 
/* Attempt to expand an old mapping */
/* Check next */
+   if (next)
+   vma_start_write(next);
if (next && next->vm_start == end && !vma_policy(next) &&
can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen,
 NULL_VM_UFFD_CTX, NULL)) {
@@ -2527,6 +2540,8 @@ unsigned long mmap_region(struct file *file, unsigned 
long addr,
}
 
/* Check prev */
+   if (prev)
+   vma_start_write(prev);
if (prev && prev->vm_end == addr && !vma_policy(prev) &&
(vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file,
   pgoff, vma->vm_userfaultfd_ctx, NULL) :
@@ -2900,6 +2915,8 @@ static int do_brk_flags(struct vma_iterator *vmi, struct 
vm_area_struct *vma,
if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT))
return -ENOMEM;
 
+   if (vma)
+   vma_start_write(vma);
/*
 * Expand the existing vma if possible; Note that singular lists do not
 * occur after forking, so the expand will only happen on new VMAs.
-- 
2.39.1



[PATCH v3 15/35] mm/khugepaged: write-lock VMA while collapsing a huge page

2023-02-15 Thread Suren Baghdasaryan
Protect VMA from concurrent page fault handler while collapsing a huge
page. Page fault handler needs a stable PMD to use PTL and relies on
per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(),
set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will
not be detected by a page fault handler without proper locking.

Before this patch, page tables can be walked under any one of the
mmap_lock, the mapping lock, and the anon_vma lock; so when khugepaged
unlinks and frees page tables, it must ensure that all of those either
are locked or don't exist. This patch adds a fourth lock under which
page tables can be traversed, and so khugepaged must also lock out that
one.

Signed-off-by: Suren Baghdasaryan 
---
 mm/khugepaged.c |  5 +
 mm/rmap.c   | 31 ---
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8dbc39896811..d9376acf1fbe 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1140,6 +1140,7 @@ static int collapse_huge_page(struct mm_struct *mm, 
unsigned long address,
if (result != SCAN_SUCCEED)
goto out_up_write;
 
+   vma_start_write(vma);
anon_vma_lock_write(vma->anon_vma);
 
mmu_notifier_range_init(, MMU_NOTIFY_CLEAR, 0, mm, address,
@@ -1607,6 +1608,9 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, 
unsigned long addr,
goto drop_hpage;
}
 
+   /* Lock the vma before taking i_mmap and page table locks */
+   vma_start_write(vma);
+
/*
 * We need to lock the mapping so that from here on, only GUP-fast and
 * hardware page walks can access the parts of the page tables that
@@ -1812,6 +1816,7 @@ static int retract_page_tables(struct address_space 
*mapping, pgoff_t pgoff,
result = SCAN_PTE_UFFD_WP;
goto unlock_next;
}
+   vma_start_write(vma);
collapse_and_free_pmd(mm, vma, addr, pmd);
if (!cc->is_khugepaged && is_target)
result = set_huge_pmd(vma, addr, pmd, hpage);
diff --git a/mm/rmap.c b/mm/rmap.c
index 15ae24585fc4..8e1a2ad9ca53 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -25,21 +25,22 @@
  * mapping->invalidate_lock (in filemap_fault)
  *   page->flags PG_locked (lock_page)
  * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share, see hugetlbfs below)
- *   mapping->i_mmap_rwsem
- * anon_vma->rwsem
- *   mm->page_table_lock or pte_lock
- * swap_lock (in swap_duplicate, swap_info_get)
- *   mmlist_lock (in mmput, drain_mmlist and others)
- *   mapping->private_lock (in block_dirty_folio)
- * folio_lock_memcg move_lock (in block_dirty_folio)
- *   i_pages lock (widely used)
- * lruvec->lru_lock (in folio_lruvec_lock_irq)
- *   inode->i_lock (in set_page_dirty's __mark_inode_dirty)
- *   bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty)
- * sb_lock (within inode_lock in fs/fs-writeback.c)
- * i_pages lock (widely used, in set_page_dirty,
- *   in arch-dependent flush_dcache_mmap_lock,
- *   within bdi.wb->list_lock in 
__sync_single_inode)
+ *   vma_start_write
+ * mapping->i_mmap_rwsem
+ *   anon_vma->rwsem
+ * mm->page_table_lock or pte_lock
+ *   swap_lock (in swap_duplicate, swap_info_get)
+ * mmlist_lock (in mmput, drain_mmlist and others)
+ * mapping->private_lock (in block_dirty_folio)
+ *   folio_lock_memcg move_lock (in block_dirty_folio)
+ * i_pages lock (widely used)
+ *   lruvec->lru_lock (in folio_lruvec_lock_irq)
+ * inode->i_lock (in set_page_dirty's __mark_inode_dirty)
+ * bdi.wb->list_lock (in set_page_dirty's 
__mark_inode_dirty)
+ *   sb_lock (within inode_lock in fs/fs-writeback.c)
+ *   i_pages lock (widely used, in set_page_dirty,
+ * in arch-dependent flush_dcache_mmap_lock,
+ * within bdi.wb->list_lock in 
__sync_single_inode)
  *
  * anon_vma->rwsem,mapping->i_mmap_rwsem   (memory_failure, collect_procs_anon)
  *   ->tasklist_lock
-- 
2.39.1



[PATCH v3 14/35] mm/mmap: move VMA locking before vma_adjust_trans_huge call

2023-02-15 Thread Suren Baghdasaryan
vma_adjust_trans_huge() modifies the VMA and such modifications should
be done after VMA is marked as being written. Therefore move VMA flag
modifications before vma_adjust_trans_huge() so that VMA is marked
before all these modifications.

Signed-off-by: Suren Baghdasaryan 
---
 mm/mmap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 06c0f6686de8..c5f2ddf17b87 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2910,11 +2910,12 @@ static int do_brk_flags(struct vma_iterator *vmi, 
struct vm_area_struct *vma,
if (vma_iter_prealloc(vmi))
goto unacct_fail;
 
+   /* Set flags first to implicitly lock the VMA before updates */
+   vm_flags_set(vma, VM_SOFTDIRTY);
vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
init_vma_prep(, vma);
vma_prepare();
vma->vm_end = addr + len;
-   vm_flags_set(vma, VM_SOFTDIRTY);
vma_iter_store(vmi, vma);
 
vma_complete(, vmi, mm);
-- 
2.39.1



[PATCH v3 13/35] mm: mark VMA as being written when changing vm_flags

2023-02-15 Thread Suren Baghdasaryan
Updates to vm_flags have to be done with VMA marked as being written for
preventing concurrent page faults or other modifications.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a056ee170e34..f4f702224ec5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -726,28 +726,28 @@ static inline void vm_flags_init(struct vm_area_struct 
*vma,
 static inline void vm_flags_reset(struct vm_area_struct *vma,
  vm_flags_t flags)
 {
-   mmap_assert_write_locked(vma->vm_mm);
+   vma_start_write(vma);
vm_flags_init(vma, flags);
 }
 
 static inline void vm_flags_reset_once(struct vm_area_struct *vma,
   vm_flags_t flags)
 {
-   mmap_assert_write_locked(vma->vm_mm);
+   vma_start_write(vma);
WRITE_ONCE(ACCESS_PRIVATE(vma, __vm_flags), flags);
 }
 
 static inline void vm_flags_set(struct vm_area_struct *vma,
vm_flags_t flags)
 {
-   mmap_assert_write_locked(vma->vm_mm);
+   vma_start_write(vma);
ACCESS_PRIVATE(vma, __vm_flags) |= flags;
 }
 
 static inline void vm_flags_clear(struct vm_area_struct *vma,
  vm_flags_t flags)
 {
-   mmap_assert_write_locked(vma->vm_mm);
+   vma_start_write(vma);
ACCESS_PRIVATE(vma, __vm_flags) &= ~flags;
 }
 
@@ -768,7 +768,7 @@ static inline void __vm_flags_mod(struct vm_area_struct 
*vma,
 static inline void vm_flags_mod(struct vm_area_struct *vma,
vm_flags_t set, vm_flags_t clear)
 {
-   mmap_assert_write_locked(vma->vm_mm);
+   vma_start_write(vma);
__vm_flags_mod(vma, set, clear);
 }
 
-- 
2.39.1



[PATCH v3 12/35] mm: add per-VMA lock and helper functions to control it

2023-02-15 Thread Suren Baghdasaryan
Introduce per-VMA locking. The lock implementation relies on a
per-vma and per-mm sequence counters to note exclusive locking:
  - read lock - (implemented by vma_start_read) requires the vma
(vm_lock_seq) and mm (mm_lock_seq) sequence counters to differ.
If they match then there must be a vma exclusive lock held somewhere.
  - read unlock - (implemented by vma_end_read) is a trivial vma->lock
unlock.
  - write lock - (vma_start_write) requires the mmap_lock to be held
exclusively and the current mm counter is assigned to the vma counter.
This will allow multiple vmas to be locked under a single mmap_lock
write lock (e.g. during vma merging). The vma counter is modified
under exclusive vma lock.
  - write unlock - (vma_end_write_all) is a batch release of all vma
locks held. It doesn't pair with a specific vma_start_write! It is
done before exclusive mmap_lock is released by incrementing mm
sequence counter (mm_lock_seq).
  - write downgrade - if the mmap_lock is downgraded to the read lock, all
vma write locks are released as well (effectivelly same as write
unlock).

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm.h| 82 +++
 include/linux/mm_types.h  |  8 
 include/linux/mmap_lock.h | 13 +++
 kernel/fork.c |  4 ++
 mm/init-mm.c  |  3 ++
 5 files changed, 110 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2992a2d55aee..a056ee170e34 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -623,6 +623,87 @@ struct vm_operations_struct {
  unsigned long addr);
 };
 
+#ifdef CONFIG_PER_VMA_LOCK
+static inline void vma_init_lock(struct vm_area_struct *vma)
+{
+   init_rwsem(>lock);
+   vma->vm_lock_seq = -1;
+}
+
+/*
+ * Try to read-lock a vma. The function is allowed to occasionally yield false
+ * locked result to avoid performance overhead, in which case we fall back to
+ * using mmap_lock. The function should never yield false unlocked result.
+ */
+static inline bool vma_start_read(struct vm_area_struct *vma)
+{
+   /* Check before locking. A race might cause false locked result. */
+   if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
+   return false;
+
+   if (unlikely(down_read_trylock(>lock) == 0))
+   return false;
+
+   /*
+* Overflow might produce false locked result.
+* False unlocked result is impossible because we modify and check
+* vma->vm_lock_seq under vma->lock protection and mm->mm_lock_seq
+* modification invalidates all existing locks.
+*/
+   if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) {
+   up_read(>lock);
+   return false;
+   }
+   return true;
+}
+
+static inline void vma_end_read(struct vm_area_struct *vma)
+{
+   rcu_read_lock(); /* keeps vma alive till the end of up_read */
+   up_read(>lock);
+   rcu_read_unlock();
+}
+
+static inline void vma_start_write(struct vm_area_struct *vma)
+{
+   int mm_lock_seq;
+
+   mmap_assert_write_locked(vma->vm_mm);
+
+   /*
+* current task is holding mmap_write_lock, both vma->vm_lock_seq and
+* mm->mm_lock_seq can't be concurrently modified.
+*/
+   mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq);
+   if (vma->vm_lock_seq == mm_lock_seq)
+   return;
+
+   down_write(>lock);
+   vma->vm_lock_seq = mm_lock_seq;
+   up_write(>lock);
+}
+
+static inline void vma_assert_write_locked(struct vm_area_struct *vma)
+{
+   mmap_assert_write_locked(vma->vm_mm);
+   /*
+* current task is holding mmap_write_lock, both vma->vm_lock_seq and
+* mm->mm_lock_seq can't be concurrently modified.
+*/
+   VM_BUG_ON_VMA(vma->vm_lock_seq != READ_ONCE(vma->vm_mm->mm_lock_seq), 
vma);
+}
+
+#else /* CONFIG_PER_VMA_LOCK */
+
+static inline void vma_init_lock(struct vm_area_struct *vma) {}
+static inline bool vma_start_read(struct vm_area_struct *vma)
+   { return false; }
+static inline void vma_end_read(struct vm_area_struct *vma) {}
+static inline void vma_start_write(struct vm_area_struct *vma) {}
+static inline void vma_assert_write_locked(struct vm_area_struct *vma) {}
+
+#endif /* CONFIG_PER_VMA_LOCK */
+
 static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
 {
static const struct vm_operations_struct dummy_vm_ops = {};
@@ -631,6 +712,7 @@ static inline void vma_init(struct vm_area_struct *vma, 
struct mm_struct *mm)
vma->vm_mm = mm;
vma->vm_ops = _vm_ops;
INIT_LIST_HEAD(>anon_vma_chain);
+   vma_init_lock(vma);
 }
 
 /* Use when VMA is not part of the VMA tree and needs no locking */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fb4e2afad787..e268723eaf44 100644
--- a/include/linux/mm_types.h

[PATCH v3 11/35] mm: move mmap_lock assert function definitions

2023-02-15 Thread Suren Baghdasaryan
Move mmap_lock assert function definitions up so that they can be used
by other mmap_lock routines.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mmap_lock.h | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 96e113e23d04..e49ba91bb1f0 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -60,6 +60,18 @@ static inline void __mmap_lock_trace_released(struct 
mm_struct *mm, bool write)
 
 #endif /* CONFIG_TRACING */
 
+static inline void mmap_assert_locked(struct mm_struct *mm)
+{
+   lockdep_assert_held(>mmap_lock);
+   VM_BUG_ON_MM(!rwsem_is_locked(>mmap_lock), mm);
+}
+
+static inline void mmap_assert_write_locked(struct mm_struct *mm)
+{
+   lockdep_assert_held_write(>mmap_lock);
+   VM_BUG_ON_MM(!rwsem_is_locked(>mmap_lock), mm);
+}
+
 static inline void mmap_init_lock(struct mm_struct *mm)
 {
init_rwsem(>mmap_lock);
@@ -150,18 +162,6 @@ static inline void mmap_read_unlock_non_owner(struct 
mm_struct *mm)
up_read_non_owner(>mmap_lock);
 }
 
-static inline void mmap_assert_locked(struct mm_struct *mm)
-{
-   lockdep_assert_held(>mmap_lock);
-   VM_BUG_ON_MM(!rwsem_is_locked(>mmap_lock), mm);
-}
-
-static inline void mmap_assert_write_locked(struct mm_struct *mm)
-{
-   lockdep_assert_held_write(>mmap_lock);
-   VM_BUG_ON_MM(!rwsem_is_locked(>mmap_lock), mm);
-}
-
 static inline int mmap_lock_is_contended(struct mm_struct *mm)
 {
return rwsem_is_contended(>mmap_lock);
-- 
2.39.1



[PATCH v3 10/35] mm: rcu safe VMA freeing

2023-02-15 Thread Suren Baghdasaryan
From: Michel Lespinasse 

This prepares for page faults handling under VMA lock, looking up VMAs
under protection of an rcu read lock, instead of the usual mmap read lock.

Signed-off-by: Michel Lespinasse 
Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm_types.h | 13 ++---
 kernel/fork.c| 20 +++-
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3fd5305dbbf9..fb4e2afad787 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -480,9 +480,16 @@ struct anon_vma_name {
 struct vm_area_struct {
/* The first cache line has the info for VMA tree walking. */
 
-   unsigned long vm_start; /* Our start address within vm_mm. */
-   unsigned long vm_end;   /* The first byte after our end address
-  within vm_mm. */
+   union {
+   struct {
+   /* VMA covers [vm_start; vm_end) addresses within mm */
+   unsigned long vm_start;
+   unsigned long vm_end;
+   };
+#ifdef CONFIG_PER_VMA_LOCK
+   struct rcu_head vm_rcu; /* Used for deferred freeing. */
+#endif
+   };
 
struct mm_struct *vm_mm;/* The address space we belong to. */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 5f23d5e03362..314d51eb91da 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -479,12 +479,30 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct 
*orig)
return new;
 }
 
-void vm_area_free(struct vm_area_struct *vma)
+static void __vm_area_free(struct vm_area_struct *vma)
 {
free_anon_vma_name(vma);
kmem_cache_free(vm_area_cachep, vma);
 }
 
+#ifdef CONFIG_PER_VMA_LOCK
+static void vm_area_free_rcu_cb(struct rcu_head *head)
+{
+   struct vm_area_struct *vma = container_of(head, struct vm_area_struct,
+ vm_rcu);
+   __vm_area_free(vma);
+}
+#endif
+
+void vm_area_free(struct vm_area_struct *vma)
+{
+#ifdef CONFIG_PER_VMA_LOCK
+   call_rcu(>vm_rcu, vm_area_free_rcu_cb);
+#else
+   __vm_area_free(vma);
+#endif
+}
+
 static void account_kernel_stack(struct task_struct *tsk, int account)
 {
if (IS_ENABLED(CONFIG_VMAP_STACK)) {
-- 
2.39.1



[PATCH v3 09/35] mm: introduce CONFIG_PER_VMA_LOCK

2023-02-15 Thread Suren Baghdasaryan
This configuration variable will be used to build the support for VMA
locking during page fault handling.

This is enabled on supported architectures with SMP and MMU set.

The architecture support is needed since the page fault handler is called
from the architecture's page faulting code which needs modifications to
handle faults under VMA lock.

Signed-off-by: Suren Baghdasaryan 
---
 mm/Kconfig | 12 
 1 file changed, 12 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index ca98b2072df5..2e4a7e61768a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1211,6 +1211,18 @@ config LRU_GEN_STATS
  This option has a per-memcg and per-node memory overhead.
 # }
 
+config ARCH_SUPPORTS_PER_VMA_LOCK
+   def_bool n
+
+config PER_VMA_LOCK
+   def_bool y
+   depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
+   help
+ Allow per-vma locking during page fault handling.
+
+ This feature allows locking each virtual memory area separately when
+ handling page faults instead of taking mmap_lock.
+
 source "mm/damon/Kconfig"
 
 endmenu
-- 
2.39.1



[PATCH v3 08/35] mm: Enable maple tree RCU mode by default.

2023-02-15 Thread Suren Baghdasaryan
From: "Liam R. Howlett" 

Use the maple tree in RCU mode for VMA tracking.  This is necessary for
the use of per-VMA locking.  RCU mode is enabled by default but disabled
when exiting an mm and for the new tree during a fork.

Also enable RCU for the tree used in munmap operations to ensure the
nodes remain valid for readers.

Signed-off-by: Liam R. Howlett 
Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm_types.h | 3 ++-
 kernel/fork.c| 3 +++
 mm/mmap.c| 4 +++-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3cd60243c625..3fd5305dbbf9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -768,7 +768,8 @@ struct mm_struct {
unsigned long cpu_bitmap[];
 };
 
-#define MM_MT_FLAGS(MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN)
+#define MM_MT_FLAGS(MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | \
+MT_FLAGS_USE_RCU)
 extern struct mm_struct init_mm;
 
 /* Pointer magic because the dynamic array size confuses some compilers. */
diff --git a/kernel/fork.c b/kernel/fork.c
index 5e3029ea8e1e..5f23d5e03362 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -617,6 +617,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
if (retval)
goto out;
 
+   mt_clear_in_rcu(vmi.mas.tree);
for_each_vma(old_vmi, mpnt) {
struct file *file;
 
@@ -700,6 +701,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
retval = arch_dup_mmap(oldmm, mm);
 loop_out:
vma_iter_free();
+   if (!retval)
+   mt_set_in_rcu(vmi.mas.tree);
 out:
mmap_write_unlock(mm);
flush_tlb_mm(oldmm);
diff --git a/mm/mmap.c b/mm/mmap.c
index 20f21f0949dd..06c0f6686de8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2277,7 +2277,8 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct 
vm_area_struct *vma,
int count = 0;
int error = -ENOMEM;
MA_STATE(mas_detach, _detach, 0, 0);
-   mt_init_flags(_detach, MT_FLAGS_LOCK_EXTERN);
+   mt_init_flags(_detach, vmi->mas.tree->ma_flags &
+ (MT_FLAGS_LOCK_MASK | MT_FLAGS_USE_RCU));
mt_set_external_lock(_detach, >mmap_lock);
 
/*
@@ -3042,6 +3043,7 @@ void exit_mmap(struct mm_struct *mm)
 */
set_bit(MMF_OOM_SKIP, >flags);
mmap_write_lock(mm);
+   mt_clear_in_rcu(>mm_mt);
free_pgtables(, >mm_mt, vma, FIRST_USER_ADDRESS,
  USER_PGTABLES_CEILING);
tlb_finish_mmu();
-- 
2.39.1



[PATCH v3 07/35] maple_tree: Add RCU lock checking to rcu callback functions

2023-02-15 Thread Suren Baghdasaryan
From: "Liam R. Howlett" 

Dereferencing RCU objects within the RCU callback without the RCU check
has caused lockdep to complain.  Fix the RCU dereferencing by using the
RCU callback lock to ensure the operation is safe.

Also stop creating a new lock to use for dereferencing during
destruction of the tree or subtree.  Instead, pass through a pointer to
the tree that has the lock that is held for RCU dereferencing checking.
It also does not make sense to use the maple state in the freeing
scenario as the tree walk is a special case where the tree no longer has
the normal encodings and parent pointers.

Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Reported-by: Suren Baghdasaryan 
Signed-off-by: Liam R. Howlett 
---
 lib/maple_tree.c | 188 ---
 1 file changed, 96 insertions(+), 92 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 8ad2d1669fad..2be86368237d 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -824,6 +824,11 @@ static inline void *mt_slot(const struct maple_tree *mt,
return rcu_dereference_check(slots[offset], mt_locked(mt));
 }
 
+static inline void *mt_slot_locked(struct maple_tree *mt, void __rcu **slots,
+  unsigned char offset)
+{
+   return rcu_dereference_protected(slots[offset], mt_locked(mt));
+}
 /*
  * mas_slot_locked() - Get the slot value when holding the maple tree lock.
  * @mas: The maple state
@@ -835,7 +840,7 @@ static inline void *mt_slot(const struct maple_tree *mt,
 static inline void *mas_slot_locked(struct ma_state *mas, void __rcu **slots,
   unsigned char offset)
 {
-   return rcu_dereference_protected(slots[offset], mt_locked(mas->tree));
+   return mt_slot_locked(mas->tree, slots, offset);
 }
 
 /*
@@ -907,34 +912,35 @@ static inline void ma_set_meta(struct maple_node *mn, 
enum maple_type mt,
 }
 
 /*
- * mas_clear_meta() - clear the metadata information of a node, if it exists
- * @mas: The maple state
+ * mt_clear_meta() - clear the metadata information of a node, if it exists
+ * @mt: The maple tree
  * @mn: The maple node
- * @mt: The maple node type
+ * @type: The maple node type
  * @offset: The offset of the highest sub-gap in this node.
  * @end: The end of the data in this node.
  */
-static inline void mas_clear_meta(struct ma_state *mas, struct maple_node *mn,
- enum maple_type mt)
+static inline void mt_clear_meta(struct maple_tree *mt, struct maple_node *mn,
+ enum maple_type type)
 {
struct maple_metadata *meta;
unsigned long *pivots;
void __rcu **slots;
void *next;
 
-   switch (mt) {
+   switch (type) {
case maple_range_64:
pivots = mn->mr64.pivot;
if (unlikely(pivots[MAPLE_RANGE64_SLOTS - 2])) {
slots = mn->mr64.slot;
-   next = mas_slot_locked(mas, slots,
-  MAPLE_RANGE64_SLOTS - 1);
-   if (unlikely((mte_to_node(next) && 
mte_node_type(next
-   return; /* The last slot is a node, no metadata 
*/
+   next = mt_slot_locked(mt, slots,
+ MAPLE_RANGE64_SLOTS - 1);
+   if (unlikely((mte_to_node(next) &&
+ mte_node_type(next
+   return; /* no metadata, could be node */
}
fallthrough;
case maple_arange_64:
-   meta = ma_meta(mn, mt);
+   meta = ma_meta(mn, type);
break;
default:
return;
@@ -5497,7 +5503,7 @@ static inline int mas_rev_alloc(struct ma_state *mas, 
unsigned long min,
 }
 
 /*
- * mas_dead_leaves() - Mark all leaves of a node as dead.
+ * mte_dead_leaves() - Mark all leaves of a node as dead.
  * @mas: The maple state
  * @slots: Pointer to the slot array
  * @type: The maple node type
@@ -5507,16 +5513,16 @@ static inline int mas_rev_alloc(struct ma_state *mas, 
unsigned long min,
  * Return: The number of leaves marked as dead.
  */
 static inline
-unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots,
- enum maple_type mt)
+unsigned char mte_dead_leaves(struct maple_enode *enode, struct maple_tree *mt,
+ void __rcu **slots)
 {
struct maple_node *node;
enum maple_type type;
void *entry;
int offset;
 
-   for (offset = 0; offset < mt_slots[mt]; offset++) {
-   entry = mas_slot_locked(mas, slots, offset);
+   for (offset = 0; offset < mt_slot_count(enode); offset++) {
+   entry = mt_slot(mt, slots, offset);
type = mte_node_type(entry);
node = mte_to_node(entry);
/* 

[PATCH v3 06/35] maple_tree: Add smp_rmb() to dead node detection

2023-02-15 Thread Suren Baghdasaryan
From: "Liam R. Howlett" 

Add an smp_rmb() before reading the parent pointer to ensure that
anything read from the node prior to the parent pointer hasn't been
reordered ahead of this check.

The is necessary for RCU mode.

Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett 
Signed-off-by: Suren Baghdasaryan 
---
 lib/maple_tree.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 6b6eddadd9d2..8ad2d1669fad 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -539,9 +539,11 @@ static inline struct maple_node *mte_parent(const struct 
maple_enode *enode)
  */
 static inline bool ma_dead_node(const struct maple_node *node)
 {
-   struct maple_node *parent = (void *)((unsigned long)
-node->parent & ~MAPLE_NODE_MASK);
+   struct maple_node *parent;
 
+   /* Do not reorder reads from the node prior to the parent check */
+   smp_rmb();
+   parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK);
return (parent == node);
 }
 
@@ -556,6 +558,8 @@ static inline bool mte_dead_node(const struct maple_enode 
*enode)
struct maple_node *parent, *node;
 
node = mte_to_node(enode);
+   /* Do not reorder reads from the node prior to the parent check */
+   smp_rmb();
parent = mte_parent(enode);
return (parent == node);
 }
-- 
2.39.1



[PATCH v3 05/35] maple_tree: Fix write memory barrier of nodes once dead for RCU mode

2023-02-15 Thread Suren Baghdasaryan
From: "Liam R. Howlett" 

During the development of the maple tree, the strategy of freeing
multiple nodes changed and, in the process, the pivots were reused to
store pointers to dead nodes.  To ensure the readers see accurate
pivots, the writers need to mark the nodes as dead and call smp_wmb() to
ensure any readers can identify the node as dead before using the pivot
values.

There were two places where the old method of marking the node as dead
without smp_wmb() were being used, which resulted in RCU readers seeing
the wrong pivot value before seeing the node was dead.  Fix this race
condition by using mte_set_node_dead() which has the smp_wmb() call to
ensure the race is closed.

Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being
freed are marked as dead to ensure there are no other call paths besides
the two updated paths.

This is necessary for the RCU mode of the maple tree.

Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett 
Signed-off-by: Suren Baghdasaryan 
---
 lib/maple_tree.c |  7 +--
 tools/testing/radix-tree/maple.c | 16 
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 3d5ab02f981a..6b6eddadd9d2 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -185,7 +185,7 @@ static void mt_free_rcu(struct rcu_head *head)
  */
 static void ma_free_rcu(struct maple_node *node)
 {
-   node->parent = ma_parent_ptr(node);
+   WARN_ON(node->parent != ma_parent_ptr(node));
call_rcu(>rcu, mt_free_rcu);
 }
 
@@ -1778,8 +1778,10 @@ static inline void mas_replace(struct ma_state *mas, 
bool advanced)
rcu_assign_pointer(slots[offset], mas->node);
}
 
-   if (!advanced)
+   if (!advanced) {
+   mte_set_node_dead(old_enode);
mas_free(mas, old_enode);
+   }
 }
 
 /*
@@ -4218,6 +4220,7 @@ static inline bool mas_wr_node_store(struct ma_wr_state 
*wr_mas)
 done:
mas_leaf_set_meta(mas, newnode, dst_pivots, maple_leaf_64, new_end);
if (in_rcu) {
+   mte_set_node_dead(mas->node);
mas->node = mt_mk_node(newnode, wr_mas->type);
mas_replace(mas, false);
} else {
diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
index 958ee9bdb316..4c89ff333f6f 100644
--- a/tools/testing/radix-tree/maple.c
+++ b/tools/testing/radix-tree/maple.c
@@ -108,6 +108,7 @@ static noinline void check_new_node(struct maple_tree *mt)
MT_BUG_ON(mt, mn->slot[1] != NULL);
MT_BUG_ON(mt, mas_allocated() != 0);
 
+   mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
mas.node = MAS_START;
mas_nomem(, GFP_KERNEL);
@@ -160,6 +161,7 @@ static noinline void check_new_node(struct maple_tree *mt)
MT_BUG_ON(mt, mas_allocated() != i);
MT_BUG_ON(mt, !mn);
MT_BUG_ON(mt, not_empty(mn));
+   mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
}
 
@@ -192,6 +194,7 @@ static noinline void check_new_node(struct maple_tree *mt)
MT_BUG_ON(mt, not_empty(mn));
MT_BUG_ON(mt, mas_allocated() != i - 1);
MT_BUG_ON(mt, !mn);
+   mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
}
 
@@ -210,6 +213,7 @@ static noinline void check_new_node(struct maple_tree *mt)
mn = mas_pop_node();
MT_BUG_ON(mt, not_empty(mn));
MT_BUG_ON(mt, mas_allocated() != j - 1);
+   mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
}
MT_BUG_ON(mt, mas_allocated() != 0);
@@ -233,6 +237,7 @@ static noinline void check_new_node(struct maple_tree *mt)
MT_BUG_ON(mt, mas_allocated() != i - j);
mn = mas_pop_node();
MT_BUG_ON(mt, not_empty(mn));
+   mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
MT_BUG_ON(mt, mas_allocated() != i - j - 1);
}
@@ -269,6 +274,7 @@ static noinline void check_new_node(struct maple_tree *mt)
mn = mas_pop_node(); /* get the next node. */
MT_BUG_ON(mt, mn == NULL);
MT_BUG_ON(mt, not_empty(mn));
+   mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
}
MT_BUG_ON(mt, mas_allocated() != 0);
@@ -294,6 +300,7 @@ static noinline void check_new_node(struct maple_tree *mt)
mn = mas_pop_node(); /* get the next node. */
MT_BUG_ON(mt, mn == NULL);
MT_BUG_ON(mt, not_empty(mn));
+   mn->parent = ma_parent_ptr(mn);

[PATCH v3 04/35] maple_tree: remove extra smp_wmb() from mas_dead_leaves()

2023-02-15 Thread Suren Baghdasaryan
From: Liam Howlett 

The call to mte_set_dead_node() before the smp_wmb() already calls
smp_wmb() so this is not needed.  This is an optimization for the RCU
mode of the maple tree.

Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett 
Signed-off-by: Suren Baghdasaryan 
---
 lib/maple_tree.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 44d6ce30b28e..3d5ab02f981a 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5517,7 +5517,6 @@ unsigned char mas_dead_leaves(struct ma_state *mas, void 
__rcu **slots,
break;
 
mte_set_node_dead(entry);
-   smp_wmb(); /* Needed for RCU */
node->type = type;
rcu_assign_pointer(slots[offset], node);
}
-- 
2.39.1



[PATCH v3 03/35] maple_tree: Fix freeing of nodes in rcu mode

2023-02-15 Thread Suren Baghdasaryan
From: Liam Howlett 

The walk to destroy the nodes was not always setting the node type and
would result in a destroy method potentially using the values as nodes.
Avoid this by setting the correct node types.  This is necessary for the
RCU mode of the maple tree.

Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett 
Signed-off-by: Suren Baghdasaryan 
---
 lib/maple_tree.c | 73 
 1 file changed, 62 insertions(+), 11 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 089cd76ec379..44d6ce30b28e 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -902,6 +902,44 @@ static inline void ma_set_meta(struct maple_node *mn, enum 
maple_type mt,
meta->end = end;
 }
 
+/*
+ * mas_clear_meta() - clear the metadata information of a node, if it exists
+ * @mas: The maple state
+ * @mn: The maple node
+ * @mt: The maple node type
+ * @offset: The offset of the highest sub-gap in this node.
+ * @end: The end of the data in this node.
+ */
+static inline void mas_clear_meta(struct ma_state *mas, struct maple_node *mn,
+ enum maple_type mt)
+{
+   struct maple_metadata *meta;
+   unsigned long *pivots;
+   void __rcu **slots;
+   void *next;
+
+   switch (mt) {
+   case maple_range_64:
+   pivots = mn->mr64.pivot;
+   if (unlikely(pivots[MAPLE_RANGE64_SLOTS - 2])) {
+   slots = mn->mr64.slot;
+   next = mas_slot_locked(mas, slots,
+  MAPLE_RANGE64_SLOTS - 1);
+   if (unlikely((mte_to_node(next) && 
mte_node_type(next
+   return; /* The last slot is a node, no metadata 
*/
+   }
+   fallthrough;
+   case maple_arange_64:
+   meta = ma_meta(mn, mt);
+   break;
+   default:
+   return;
+   }
+
+   meta->gap = 0;
+   meta->end = 0;
+}
+
 /*
  * ma_meta_end() - Get the data end of a node from the metadata
  * @mn: The maple node
@@ -5455,20 +5493,22 @@ static inline int mas_rev_alloc(struct ma_state *mas, 
unsigned long min,
  * mas_dead_leaves() - Mark all leaves of a node as dead.
  * @mas: The maple state
  * @slots: Pointer to the slot array
+ * @type: The maple node type
  *
  * Must hold the write lock.
  *
  * Return: The number of leaves marked as dead.
  */
 static inline
-unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots)
+unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots,
+ enum maple_type mt)
 {
struct maple_node *node;
enum maple_type type;
void *entry;
int offset;
 
-   for (offset = 0; offset < mt_slot_count(mas->node); offset++) {
+   for (offset = 0; offset < mt_slots[mt]; offset++) {
entry = mas_slot_locked(mas, slots, offset);
type = mte_node_type(entry);
node = mte_to_node(entry);
@@ -5487,14 +5527,13 @@ unsigned char mas_dead_leaves(struct ma_state *mas, 
void __rcu **slots)
 
 static void __rcu **mas_dead_walk(struct ma_state *mas, unsigned char offset)
 {
-   struct maple_node *node, *next;
+   struct maple_node *next;
void __rcu **slots = NULL;
 
next = mas_mn(mas);
do {
-   mas->node = ma_enode_ptr(next);
-   node = mas_mn(mas);
-   slots = ma_slots(node, node->type);
+   mas->node = mt_mk_node(next, next->type);
+   slots = ma_slots(next, next->type);
next = mas_slot_locked(mas, slots, offset);
offset = 0;
} while (!ma_is_leaf(next->type));
@@ -5558,11 +5597,14 @@ static inline void __rcu **mas_destroy_descend(struct 
ma_state *mas,
node = mas_mn(mas);
slots = ma_slots(node, mte_node_type(mas->node));
next = mas_slot_locked(mas, slots, 0);
-   if ((mte_dead_node(next)))
+   if ((mte_dead_node(next))) {
+   mte_to_node(next)->type = mte_node_type(next);
next = mas_slot_locked(mas, slots, 1);
+   }
 
mte_set_node_dead(mas->node);
node->type = mte_node_type(mas->node);
+   mas_clear_meta(mas, node, node->type);
node->piv_parent = prev;
node->parent_slot = offset;
offset = 0;
@@ -5582,13 +5624,18 @@ static void mt_destroy_walk(struct maple_enode *enode, 
unsigned char ma_flags,
 
MA_STATE(mas, , 0, 0);
 
-   if (mte_is_leaf(enode))
+   mas.node = enode;
+   if (mte_is_leaf(enode)) {
+   node->type = mte_node_type(enode);
goto free_leaf;
+   }
 
+   ma_flags &= ~MT_FLAGS_LOCK_MASK;
mt_init_flags(, ma_flags);
mas_lock();
 
-   

[PATCH v3 02/35] maple_tree: Detect dead nodes in mas_start()

2023-02-15 Thread Suren Baghdasaryan
From: Liam Howlett 

When initially starting a search, the root node may already be in the
process of being replaced in RCU mode.  Detect and restart the walk if
this is the case.  This is necessary for RCU mode of the maple tree.

Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett 
Signed-off-by: Suren Baghdasaryan 
---
 lib/maple_tree.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index cc356b8369ad..089cd76ec379 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1360,12 +1360,16 @@ static inline struct maple_enode *mas_start(struct 
ma_state *mas)
mas->max = ULONG_MAX;
mas->depth = 0;
 
+retry:
root = mas_root(mas);
/* Tree with nodes */
if (likely(xa_is_node(root))) {
mas->depth = 1;
mas->node = mte_safe_root(root);
mas->offset = 0;
+   if (mte_dead_node(mas->node))
+   goto retry;
+
return NULL;
}
 
-- 
2.39.1



[PATCH v3 00/35] Per-VMA locks

2023-02-15 Thread Suren Baghdasaryan
Previous version:
v2: https://lore.kernel.org/lkml/20230127194110.533103-1-sur...@google.com/
v1: https://lore.kernel.org/all/20230109205336.3665937-1-sur...@google.com/
RFC: https://lore.kernel.org/all/20220901173516.702122-1-sur...@google.com/

LWN article describing the feature:
https://lwn.net/Articles/906852/

Per-vma locks idea that was discussed during SPF [1] discussion at LSF/MM
last year [2], which concluded with suggestion that “a reader/writer
semaphore could be put into the VMA itself; that would have the effect of
using the VMA as a sort of range lock. There would still be contention at
the VMA level, but it would be an improvement.” This patchset implements
this suggested approach.

When handling page faults we lookup the VMA that contains the faulting
page under RCU protection and try to acquire its lock. If that fails we
fall back to using mmap_lock, similar to how SPF handled this situation.

One notable way the implementation deviates from the proposal is the way
VMAs are read-locked. During some of mm updates, multiple VMAs need to be
locked until the end of the update (e.g. vma_merge, split_vma, etc).
Tracking all the locked VMAs, avoiding recursive locks, figuring out when
it's safe to unlock previously locked VMAs would make the code more
complex. So, instead of the usual lock/unlock pattern, the proposed
solution marks a VMA as locked and provides an efficient way to:
1. Identify locked VMAs.
2. Unlock all locked VMAs in bulk.
We also postpone unlocking the locked VMAs until the end of the update,
when we do mmap_write_unlock. Potentially this keeps a VMA locked for
longer than is absolutely necessary but it results in a big reduction of
code complexity.
Read-locking a VMA is done using two sequence numbers - one in the
vm_area_struct and one in the mm_struct. VMA is considered read-locked
when these sequence numbers are equal. To read-lock a VMA we set the
sequence number in vm_area_struct to be equal to the sequence number in
mm_struct. To unlock all VMAs we increment mm_struct's seq number. This
allows for an efficient way to track locked VMAs and to drop the locks on
all VMAs at the end of the update.

The patchset implements per-VMA locking only for anonymous pages which
are not in swap and avoids userfaultfs as their implementation is more
complex. Additional support for file-back page faults, swapped and user
pages can be added incrementally.

Performance benchmarks show similar although slightly smaller benefits as
with SPF patchset (~75% of SPF benefits). Still, with lower complexity
this approach might be more desirable.

Since RFC was posted in September 2022, two separate Google teams outside
of Android evaluated the patchset and confirmed positive results. Here are
the known usecases when per-VMA locks show benefits:

Android:
Apps with high number of threads (~100) launch times improve by up to 20%.
Each thread mmaps several areas upon startup (Stack and Thread-local
storage (TLS), thread signal stack, indirect ref table), which requires
taking mmap_lock in write mode. Page faults take mmap_lock in read mode.
During app launch, both thread creation and page faults establishing the
active workinget are happening in parallel and that causes lock contention
between mm writers and readers even if updates and page faults are
happening in different VMAs. Per-vma locks prevent this contention by
providing more granular lock.

Google Fibers:
We have several dynamically sized thread pools that spawn new threads
under increased load and reduce their number when idling. For example,
Google's in-process scheduling/threading framework, UMCG/Fibers, is backed
by such a thread pool. When idling, only a small number of idle worker
threads are available; when a spike of incoming requests arrive, each
request is handled in its own "fiber", which is a work item posted onto a
UMCG worker thread; quite often these spikes lead to a number of new
threads spawning. Each new thread needs to allocate and register an RSEQ
section on its TLS, then register itself with the kernel as a UMCG worker
thread, and only after that it can be considered by the in-process
UMCG/Fiber scheduler as available to do useful work. In short, during an
incoming workload spike new threads have to be spawned, and they perform
several syscalls (RSEQ registration, UMCG worker registration, memory
allocations) before they can actually start doing useful work. Removing
any bottlenecks on this thread startup path will greatly improve our
services' latencies when faced with request/workload spikes.
At high scale, mmap_lock contention during thread creation and stack page
faults leads to user-visible multi-second serving latencies in a similar
pattern to Android app startup. Per-VMA locking patchset has been run
successfully in limited experiments with user-facing production workloads.
In these experiments, we observed that the peak thread creation rate was
high enough that thread creation is no longer a bottleneck.

TCP 

[PATCH v3 01/35] maple_tree: Be more cautious about dead nodes

2023-02-15 Thread Suren Baghdasaryan
From: Liam Howlett 

ma_pivots() and ma_data_end() may be called with a dead node.  Ensure to
that the node isn't dead before using the returned values.

This is necessary for RCU mode of the maple tree.

Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett 
Signed-off-by: Suren Baghdasaryan 
---
 lib/maple_tree.c | 52 +++-
 1 file changed, 43 insertions(+), 9 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 646297cae5d1..cc356b8369ad 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -544,6 +544,7 @@ static inline bool ma_dead_node(const struct maple_node 
*node)
 
return (parent == node);
 }
+
 /*
  * mte_dead_node() - check if the @enode is dead.
  * @enode: The encoded maple node
@@ -625,6 +626,8 @@ static inline unsigned int mas_alloc_req(const struct 
ma_state *mas)
  * @node - the maple node
  * @type - the node type
  *
+ * In the event of a dead node, this array may be %NULL
+ *
  * Return: A pointer to the maple node pivots
  */
 static inline unsigned long *ma_pivots(struct maple_node *node,
@@ -1096,8 +1099,11 @@ static int mas_ascend(struct ma_state *mas)
a_type = mas_parent_enum(mas, p_enode);
a_node = mte_parent(p_enode);
a_slot = mte_parent_slot(p_enode);
-   pivots = ma_pivots(a_node, a_type);
a_enode = mt_mk_node(a_node, a_type);
+   pivots = ma_pivots(a_node, a_type);
+
+   if (unlikely(ma_dead_node(a_node)))
+   return 1;
 
if (!set_min && a_slot) {
set_min = true;
@@ -1401,6 +1407,9 @@ static inline unsigned char ma_data_end(struct maple_node 
*node,
 {
unsigned char offset;
 
+   if (!pivots)
+   return 0;
+
if (type == maple_arange_64)
return ma_meta_end(node, type);
 
@@ -1436,6 +1445,9 @@ static inline unsigned char mas_data_end(struct ma_state 
*mas)
return ma_meta_end(node, type);
 
pivots = ma_pivots(node, type);
+   if (unlikely(ma_dead_node(node)))
+   return 0;
+
offset = mt_pivots[type] - 1;
if (likely(!pivots[offset]))
return ma_meta_end(node, type);
@@ -4505,6 +4517,9 @@ static inline int mas_prev_node(struct ma_state *mas, 
unsigned long min)
node = mas_mn(mas);
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
+   if (unlikely(ma_dead_node(node)))
+   return 1;
+
mas->max = pivots[offset];
if (offset)
mas->min = pivots[offset - 1] + 1;
@@ -4526,6 +4541,9 @@ static inline int mas_prev_node(struct ma_state *mas, 
unsigned long min)
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
offset = ma_data_end(node, mt, pivots, mas->max);
+   if (unlikely(ma_dead_node(node)))
+   return 1;
+
if (offset)
mas->min = pivots[offset - 1] + 1;
 
@@ -4574,6 +4592,7 @@ static inline int mas_next_node(struct ma_state *mas, 
struct maple_node *node,
struct maple_enode *enode;
int level = 0;
unsigned char offset;
+   unsigned char node_end;
enum maple_type mt;
void __rcu **slots;
 
@@ -4597,7 +4616,11 @@ static inline int mas_next_node(struct ma_state *mas, 
struct maple_node *node,
node = mas_mn(mas);
mt = mte_node_type(mas->node);
pivots = ma_pivots(node, mt);
-   } while (unlikely(offset == ma_data_end(node, mt, pivots, mas->max)));
+   node_end = ma_data_end(node, mt, pivots, mas->max);
+   if (unlikely(ma_dead_node(node)))
+   return 1;
+
+   } while (unlikely(offset == node_end));
 
slots = ma_slots(node, mt);
pivot = mas_safe_pivot(mas, pivots, ++offset, mt);
@@ -4613,6 +4636,9 @@ static inline int mas_next_node(struct ma_state *mas, 
struct maple_node *node,
mt = mte_node_type(mas->node);
slots = ma_slots(node, mt);
pivots = ma_pivots(node, mt);
+   if (unlikely(ma_dead_node(node)))
+   return 1;
+
offset = 0;
pivot = pivots[0];
}
@@ -4659,11 +4685,14 @@ static inline void *mas_next_nentry(struct ma_state 
*mas,
return NULL;
}
 
-   pivots = ma_pivots(node, type);
slots = ma_slots(node, type);
-   mas->index = mas_safe_min(mas, pivots, mas->offset);
+   pivots = ma_pivots(node, type);
count = ma_data_end(node, type, pivots, mas->max);
-   if (ma_dead_node(node))
+   if (unlikely(ma_dead_node(node)))
+   return NULL;
+
+   mas->index = mas_safe_min(mas, pivots, mas->offset);
+   if (unlikely(ma_dead_node(node)))
return NULL;
 
  

[PATCH 1/2] kcsan: xtensa: Add atomic builtin stubs for 32-bit systems

2023-02-15 Thread Rohan McLure
KCSAN instruments calls to atomic builtins, and will in turn call these
builtins itself. As such, architectures supporting KCSAN must have
compiler support for these atomic primitives.

Since 32-bit systems are unlikely to have 64-bit compiler builtins,
provide a stub for each missing builtin, and use BUG() to assert
unreachability.

In commit 725aea873261 ("xtensa: enable KCSAN"), xtensa implements these
locally. Move these definitions to be accessible to all 32-bit
architectures that do not provide the necessary builtins, with opt in
for PowerPC and xtensa.

Signed-off-by: Rohan McLure 
Reviewed-by: Max Filippov 
---
Previously issued as a part of a patch series adding KCSAN support to
64-bit.
Link: 
https://lore.kernel.org/linuxppc-dev/167646486000.1421441.10070059569986228558.b4...@ellerman.id.au/T/#t
v1: Remove __has_builtin check, as gcc is not obligated to inline
builtins detected using this check, but instead is permitted to supply
them in libatomic:
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108734
Instead, opt-in PPC32 and xtensa.
---
 arch/xtensa/lib/Makefile  | 1 -
 kernel/kcsan/Makefile | 2 ++
 arch/xtensa/lib/kcsan-stubs.c => kernel/kcsan/stubs.c | 0
 3 files changed, 2 insertions(+), 1 deletion(-)
 rename arch/xtensa/lib/kcsan-stubs.c => kernel/kcsan/stubs.c (100%)

diff --git a/arch/xtensa/lib/Makefile b/arch/xtensa/lib/Makefile
index 7ecef0519a27..d69356dc97df 100644
--- a/arch/xtensa/lib/Makefile
+++ b/arch/xtensa/lib/Makefile
@@ -8,5 +8,4 @@ lib-y   += memcopy.o memset.o checksum.o \
   divsi3.o udivsi3.o modsi3.o umodsi3.o mulsi3.o umulsidi3.o \
   usercopy.o strncpy_user.o strnlen_user.o
 lib-$(CONFIG_PCI) += pci-auto.o
-lib-$(CONFIG_KCSAN) += kcsan-stubs.o
 KCSAN_SANITIZE_kcsan-stubs.o := n
diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
index 8cf70f068d92..86dd713d8855 100644
--- a/kernel/kcsan/Makefile
+++ b/kernel/kcsan/Makefile
@@ -12,6 +12,8 @@ CFLAGS_core.o := $(call cc-option,-fno-conserve-stack) \
-fno-stack-protector -DDISABLE_BRANCH_PROFILING
 
 obj-y := core.o debugfs.o report.o
+obj-$(CONFIG_PPC32) += stubs.o
+obj-$(CONFIG_XTENSA) += stubs.o
 
 KCSAN_INSTRUMENT_BARRIERS_selftest.o := y
 obj-$(CONFIG_KCSAN_SELFTEST) += selftest.o
diff --git a/arch/xtensa/lib/kcsan-stubs.c b/kernel/kcsan/stubs.c
similarity index 100%
rename from arch/xtensa/lib/kcsan-stubs.c
rename to kernel/kcsan/stubs.c
-- 
2.37.2



[PATCH 2/2] powerpc/{32,book3e}: kcsan: Extend KCSAN Support

2023-02-15 Thread Rohan McLure
Enable HAVE_ARCH_KCSAN on all powerpc platforms, permitting use of the
kernel concurrency sanitiser through the CONFIG_KCSAN_* kconfig options.

Boots and passes selftests on 32-bit and 64-bit platforms. See
documentation in Documentation/dev-tools/kcsan.rst for more information.

Signed-off-by: Rohan McLure 
---
New patch
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2c9cdf1d8761..45771448d47a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -197,7 +197,7 @@ config PPC
select HAVE_ARCH_KASAN  if PPC_RADIX_MMU
select HAVE_ARCH_KASAN  if PPC_BOOK3E_64
select HAVE_ARCH_KASAN_VMALLOC  if HAVE_ARCH_KASAN
-   select HAVE_ARCH_KCSAN  if PPC_BOOK3S_64
+   select HAVE_ARCH_KCSAN
select HAVE_ARCH_KFENCE if ARCH_SUPPORTS_DEBUG_PAGEALLOC
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
select HAVE_ARCH_KGDB
-- 
2.37.2



linux-next: build warning after merge of the powerpc tree

2023-02-15 Thread Stephen Rothwell
Hi all,

After merging the powerpc tree, today's linux-next build (powerpc
pseries_le_defconfig) produced this warning:

arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x6128: unannotated 
intra-function call

I have no idea what caused this.

-- 
Cheers,
Stephen Rothwell


pgpBIDcJ9D4s1.pgp
Description: OpenPGP digital signature


Re: [PATCH v2] usb: fix some spelling mistakes in comment of gadget

2023-02-15 Thread Randy Dunlap



On 2/15/23 17:35, Zhou nan wrote:
> usb: Fix spelling mistake in comments of gadget.
> 
> Signed-off-by: Zhou nan 

Acked-by: Randy Dunlap 

Thanks.

> ---
> v2:
> - Modify the title and description text.
> ---
>  drivers/usb/gadget/udc/fsl_udc_core.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)


-- 
~Randy


[PATCH v2] usb: fix some spelling mistakes in comment of gadget

2023-02-15 Thread Zhou nan
usb: Fix spelling mistake in comments of gadget.

Signed-off-by: Zhou nan 
---
v2:
- Modify the title and description text.
---
 drivers/usb/gadget/udc/fsl_udc_core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c 
b/drivers/usb/gadget/udc/fsl_udc_core.c
index a67873a074b7..da876d09fc01 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -471,7 +471,7 @@ static int dr_ep_get_stall(unsigned char ep_num, unsigned 
char dir)
 /
 
 /*--
-* struct_ep_qh_setup(): set the Endpoint Capabilites field of QH
+* struct_ep_qh_setup(): set the Endpoint Capabilities field of QH
  * @zlt: Zero Length Termination Select (1: disable; 0: enable)
  * @mult: Mult field
  --*/
@@ -483,7 +483,7 @@ static void struct_ep_qh_setup(struct fsl_udc *udc, 
unsigned char ep_num,
struct ep_queue_head *p_QH = >ep_qh[2 * ep_num + dir];
unsigned int tmp = 0;
 
-   /* set the Endpoint Capabilites in QH */
+   /* set the Endpoint Capabilities in QH */
switch (ep_type) {
case USB_ENDPOINT_XFER_CONTROL:
/* Interrupt On Setup (IOS). for control ep  */
@@ -593,7 +593,7 @@ static int fsl_ep_enable(struct usb_ep *_ep,
ep->stopped = 0;
 
/* Controller related setup */
-   /* Init EPx Queue Head (Ep Capabilites field in QH
+   /* Init EPx Queue Head (Ep Capabilities field in QH
 * according to max, zlt, mult) */
struct_ep_qh_setup(udc, (unsigned char) ep_index(ep),
(unsigned char) ((desc->bEndpointAddress & USB_DIR_IN)
@@ -1361,7 +1361,7 @@ static void ch9getstatus(struct fsl_udc *udc, u8 
request_type, u16 value,
udc->ep0_dir = USB_DIR_IN;
/* Borrow the per device status_req */
req = udc->status_req;
-   /* Fill in the reqest structure */
+   /* Fill in the request structure */
*((u16 *) req->req.buf) = cpu_to_le16(tmp);
 
req->ep = ep;
-- 
2.27.0



Re: [PATCH v1 5/5] powerpc/epapr: Don't use wrteei on non booke

2023-02-15 Thread Pali Rohár
On Tuesday 20 December 2022 21:21:11 Pali Rohár wrote:
> On Monday 19 December 2022 19:46:00 Christophe Leroy wrote:
> > wrteei is only for booke. Use the standard mfmsr/ori/mtmsr
> > when non booke.
> > 
> > Reported-by: Jan-Benedict Glaw 
> > Signed-off-by: Christophe Leroy 
> > ---
> > Not sure this is needed at all, the commit that introduced the code says it 
> > is for e500, but there's no such limitation in Kconfig. Maybe we should 
> > limit all the file to CONFIG_PPC_E500
> 
> This ePAPR code is according to ePAPR v1.1. So it does not have to be
> e500 specific. But is there anything else in this category?

Scott Wood: Do you know any details about it?

> > ---
> >  arch/powerpc/kernel/epapr_hcalls.S | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/powerpc/kernel/epapr_hcalls.S 
> > b/arch/powerpc/kernel/epapr_hcalls.S
> > index 69a912550577..033116e465d0 100644
> > --- a/arch/powerpc/kernel/epapr_hcalls.S
> > +++ b/arch/powerpc/kernel/epapr_hcalls.S
> > @@ -21,7 +21,13 @@ _GLOBAL(epapr_ev_idle)
> > ori r4, r4,_TLF_NAPPING /* so when we take an exception */
> > PPC_STL r4, TI_LOCAL_FLAGS(r2)  /* it will return to our caller */
> >  
> > +#ifdef CONFIG_BOOKE_OR_40x
> > wrteei  1
> > +#else
> > +   mfmsr   r4
> > +   ori r4, r4, MSR_EE
> > +   mtmsr   r4
> > +#endif
> >  
> >  idle_loop:
> > LOAD_REG_IMMEDIATE(r11, EV_HCALL_TOKEN(EV_IDLE))
> > -- 
> > 2.38.1
> > 


[PATCH v8 7/7] powerpc: mm: Support page table check

2023-02-15 Thread Rohan McLure
On creation and clearing of a page table mapping, instrument such calls
by invoking page_table_check_pte_set and page_table_check_pte_clear
respectively. These calls serve as a sanity check against illegal
mappings.

Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK for all platforms.

Use set_pte internally, and cause this function to reassign a page table
entry without instrumentation. Generic code will be instrumented, as it
references set_pte_at.

See also:

riscv support in commit 3fee229a8eb9 ("riscv/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
arm64 in commit 42b2547137f5 ("arm64/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
check")

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
V2: Update spacing and types assigned to pte_update calls.
V3: Update one last pte_update call to remove __pte invocation.
V5: Fix 32-bit nohash double set
V6: Omit __set_pte_at instrumentation - should be instrumented by
set_pte_at, with set_pte in between, performing all prior checks.
Instrument pmds. Use set_pte where needed.
V7: Make set_pte_at an inline function. Fix commit message.
Detail changes of internal references to set_pte_at, and its semantics.
V8: Move linux/page_table_check.h import to be below
{nohash,book3s}/pgtable.h includes.
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  8 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 44 
 arch/powerpc/include/asm/nohash/32/pgtable.h |  7 +++-
 arch/powerpc/include/asm/nohash/64/pgtable.h |  8 +++-
 arch/powerpc/include/asm/pgtable.h   | 10 -
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  2 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 16 ---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 10 ++---
 arch/powerpc/mm/nohash/book3e_pgtable.c  |  2 +-
 arch/powerpc/mm/pgtable_32.c |  2 +-
 11 files changed, 83 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2c9cdf1d8761..2474e2699037 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -154,6 +154,7 @@ config PPC
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC_BOOK3S || PPC_8xx || 40x
+   select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_MEMTEST
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index afd672e84791..8850b4fb22a4 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -53,6 +53,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
+
 static inline bool pte_user(pte_t pte)
 {
return pte_val(pte) & _PAGE_USER;
@@ -338,7 +340,11 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
   pte_t *ptep)
 {
-   return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index a6ed93d01da1..0c6838875720 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -162,6 +162,8 @@
 #define PAGE_KERNEL_ROX__pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX)
 
 #ifndef __ASSEMBLY__
+#include 
+
 /*
  * page table defines
  */
@@ -431,8 +433,11 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
-   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
-   return __pte(old);
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
@@ -441,11 +446,16 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
pte_t *ptep, int full)
 {
if (full && radix_enabled()) {
+   pte_t old_pte;
+
/*
 * We know that this is a full mm pte clear and
 * hence can be sure there is no parallel set_pte.
 */
-   return radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   old_pte = radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return 

[PATCH v8 6/7] powerpc: mm: Add p{te,md,ud}_user_accessible_page helpers

2023-02-15 Thread Rohan McLure
Add the following helpers for detecting whether a page table entry
is a leaf and is accessible to user space.

 * pte_user_accessible_page
 * pmd_user_accessible_page
 * pud_user_accessible_page

Also implement missing pud_user definitions for both Book3S/nohash 64-bit
systems, and pmd_user for Book3S/nohash 32-bit systems.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
V2: Provide missing pud_user implementations, use p{u,m}d_is_leaf.
V3: Provide missing pmd_user implementations as stubs in 32-bit.
V4: Use pmd_leaf, pud_leaf, and define pmd_user for 32 Book3E with
static inline method rather than macro.
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  4 
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 ++
 arch/powerpc/include/asm/nohash/32/pgtable.h |  5 +
 arch/powerpc/include/asm/nohash/64/pgtable.h | 10 ++
 arch/powerpc/include/asm/pgtable.h   | 15 +++
 5 files changed, 44 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index a090cb13a4a0..afd672e84791 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -516,6 +516,10 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
 }
 
+static inline bool pmd_user(pmd_t pmd)
+{
+   return 0;
+}
 
 
 /* This low level function performs the actual PTE insertion
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index df5ee856444d..a6ed93d01da1 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -538,6 +538,16 @@ static inline bool pte_user(pte_t pte)
return !(pte_raw(pte) & cpu_to_be64(_PAGE_PRIVILEGED));
 }
 
+static inline bool pmd_user(pmd_t pmd)
+{
+   return !(pmd_raw(pmd) & cpu_to_be64(_PAGE_PRIVILEGED));
+}
+
+static inline bool pud_user(pud_t pud)
+{
+   return !(pud_raw(pud) & cpu_to_be64(_PAGE_PRIVILEGED));
+}
+
 #define pte_access_permitted pte_access_permitted
 static inline bool pte_access_permitted(pte_t pte, bool write)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 70edad44dff6..d953533c56ff 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -209,6 +209,11 @@ static inline void pmd_clear(pmd_t *pmdp)
*pmdp = __pmd(0);
 }
 
+static inline bool pmd_user(pmd_t pmd)
+{
+   return false;
+}
+
 /*
  * PTE updates. This function is called whenever an existing
  * valid PTE is updated. This does -not- include set_pte_at()
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index d391a45e0f11..14e69ebad31f 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -123,6 +123,11 @@ static inline pte_t pmd_pte(pmd_t pmd)
return __pte(pmd_val(pmd));
 }
 
+static inline bool pmd_user(pmd_t pmd)
+{
+   return (pmd_val(pmd) & _PAGE_USER) == _PAGE_USER;
+}
+
 #define pmd_none(pmd)  (!pmd_val(pmd))
 #definepmd_bad(pmd)(!is_kernel_addr(pmd_val(pmd)) \
 || (pmd_val(pmd) & PMD_BAD_BITS))
@@ -164,6 +169,11 @@ static inline pte_t pud_pte(pud_t pud)
return __pte(pud_val(pud));
 }
 
+static inline bool pud_user(pud_t pud)
+{
+   return (pud_val(pud) & _PAGE_USER) == _PAGE_USER;
+}
+
 static inline pud_t pte_pud(pte_t pte)
 {
return __pud(pte_val(pte));
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index ad0829f816e9..b76fdb80b6c9 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -167,6 +167,21 @@ static inline int pud_pfn(pud_t pud)
 }
 #endif
 
+static inline bool pte_user_accessible_page(pte_t pte)
+{
+   return pte_present(pte) && pte_user(pte);
+}
+
+static inline bool pmd_user_accessible_page(pmd_t pmd)
+{
+   return pmd_leaf(pmd) && pmd_present(pmd) && pmd_user(pmd);
+}
+
+static inline bool pud_user_accessible_page(pud_t pud)
+{
+   return pud_leaf(pud) && pud_present(pud) && pud_user(pud);
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.37.2



[PATCH v8 5/7] powerpc: mm: Add common pud_pfn stub for all platforms

2023-02-15 Thread Rohan McLure
Prior to this commit, pud_pfn was implemented with BUILD_BUG as the inline
function for 64-bit Book3S systems but is never included, as its
invocations in generic code are guarded by calls to pud_devmap which return
zero on such systems. A future patch will provide support for page table
checks, the generic code for which depends on a pud_pfn stub being
implemented, even while the patch will not interact with puds directly.

Remove the 64-bit Book3S stub and define pud_pfn to warn on all
platforms. pud_pfn may be defined properly on a per-platform basis
should it grow real usages in future.

Signed-off-by: Rohan McLure 
---
V2: Remove conditional BUILD_BUG and BUG. Instead warn on usage.
V3: Replace WARN with WARN_ONCE, which should suffice to demonstrate
misuse of puds.
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 --
 arch/powerpc/include/asm/pgtable.h   | 14 ++
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 589d2dbe3873..df5ee856444d 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1327,16 +1327,6 @@ static inline int pgd_devmap(pgd_t pgd)
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-static inline int pud_pfn(pud_t pud)
-{
-   /*
-* Currently all calls to pud_pfn() are gated around a pud_devmap()
-* check so this should never be used. If it grows another user we
-* want to know about it.
-*/
-   BUILD_BUG();
-   return 0;
-}
 #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
 pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
 void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 284408829fa3..ad0829f816e9 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -153,6 +153,20 @@ struct seq_file;
 void arch_report_meminfo(struct seq_file *m);
 #endif /* CONFIG_PPC64 */
 
+/*
+ * Currently only consumed by page_table_check_pud_{set,clear}. Since clears
+ * and sets to page table entries at any level are done through
+ * page_table_check_pte_{set,clear}, provide stub implementation.
+ */
+#ifndef pud_pfn
+#define pud_pfn pud_pfn
+static inline int pud_pfn(pud_t pud)
+{
+   WARN_ONCE(1, "pud: platform does not use pud entries directly");
+   return 0;
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.37.2



[PATCH v8 4/7] powerpc: mm: Implement p{m,u,4}d_leaf on all platforms

2023-02-15 Thread Rohan McLure
The check that a higher-level entry in multi-level pages contains a page
translation entry (pte) is performed by p{m,u,4}d_leaf stubs, which may
be specialised for each choice of mmu. In a prior commit, we replace
uses to the catch-all stubs, p{m,u,4}d_is_leaf with p{m,u,4}d_leaf.

Replace the catch-all stub definitions for p{m,u,4}d_is_leaf with
definitions for p{m,u,4}d_leaf. A future patch will assume that
p{m,u,4}d_leaf is defined on all platforms.

In particular, implement pud_leaf for Book3E-64, pmd_leaf for all Book3E
and Book3S-64 platforms, with a catch-all definition for p4d_leaf.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v5: Split patch that replaces p{m,u,4}d_is_leaf into two patches, first
replacing callsites and afterward providing generic definition.
Remove ifndef-defines implementing p{m,u}d_leaf in favour of
implementing stubs in headers belonging to the particular platforms
needing them.
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  5 +
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 -
 arch/powerpc/include/asm/nohash/64/pgtable.h |  6 ++
 arch/powerpc/include/asm/nohash/pgtable.h|  6 ++
 arch/powerpc/include/asm/pgtable.h   | 22 ++--
 5 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 75823f39e042..a090cb13a4a0 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -242,6 +242,11 @@ static inline void pmd_clear(pmd_t *pmdp)
*pmdp = __pmd(0);
 }
 
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
+{
+   return false;
+}
 
 /*
  * When flushing the tlb entry for a page, we also need to flush the hash
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 7e0d546f4b3c..589d2dbe3873 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1359,16 +1359,14 @@ static inline bool is_pte_rw_upgrade(unsigned long 
old_val, unsigned long new_va
 /*
  * Like pmd_huge() and pmd_large(), but works regardless of config options
  */
-#define pmd_is_leaf pmd_is_leaf
-#define pmd_leaf pmd_is_leaf
-static inline bool pmd_is_leaf(pmd_t pmd)
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
 {
return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
 }
 
-#define pud_is_leaf pud_is_leaf
-#define pud_leaf pud_is_leaf
-static inline bool pud_is_leaf(pud_t pud)
+#define pud_leaf pud_leaf
+static inline bool pud_leaf(pud_t pud)
 {
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
 }
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 879e9a6e5a87..d391a45e0f11 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -141,6 +141,12 @@ static inline void pud_clear(pud_t *pudp)
*pudp = __pud(0);
 }
 
+#define pud_leaf pud_leaf
+static inline bool pud_leaf(pud_t pud)
+{
+   return false;
+}
+
 #define pud_none(pud)  (!pud_val(pud))
 #definepud_bad(pud)(!is_kernel_addr(pud_val(pud)) \
 || (pud_val(pud) & PUD_BAD_BITS))
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index f36dd2e2d591..43b50fd8d236 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -60,6 +60,12 @@ static inline bool pte_hw_valid(pte_t pte)
return pte_val(pte) & _PAGE_PRESENT;
 }
 
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
+{
+   return false;
+}
+
 /*
  * Don't just check for any non zero bits in __PAGE_USER, since for book3e
  * and PTE_64BIT, PAGE_KERNEL_X contains _PAGE_BAP_SR which is also in
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 17d30359d1f4..284408829fa3 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -128,29 +128,11 @@ static inline void pte_frag_set(mm_context_t *ctx, void 
*p)
 }
 #endif
 
-#ifndef pmd_is_leaf
-#define pmd_is_leaf pmd_is_leaf
-static inline bool pmd_is_leaf(pmd_t pmd)
+#define p4d_leaf p4d_leaf
+static inline bool p4d_leaf(p4d_t p4d)
 {
return false;
 }
-#endif
-
-#ifndef pud_is_leaf
-#define pud_is_leaf pud_is_leaf
-static inline bool pud_is_leaf(pud_t pud)
-{
-   return false;
-}
-#endif
-
-#ifndef p4d_is_leaf
-#define p4d_is_leaf p4d_is_leaf
-static inline bool p4d_is_leaf(p4d_t p4d)
-{
-   return false;
-}
-#endif
 
 #define pmd_pgtable pmd_pgtable
 static inline pgtable_t pmd_pgtable(pmd_t pmd)
-- 
2.37.2



[PATCH v8 3/7] powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf

2023-02-15 Thread Rohan McLure
Replace occurrences of p{u,m,4}d_is_leaf with p{u,m,4}_leaf, as the
latter is the name given to checking that a higher-level entry in
multi-level paging contains a page translation entry (pte) throughout
all other archs.

A future patch will implement p{u,m,4}_leaf stubs on all platforms so
that they may be referenced in generic code.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
V4: New patch
V5: Previously replaced stub definition for *_is_leaf with *_leaf. Do
that in a later patch
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 ++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 14 +++---
 arch/powerpc/mm/pgtable.c|  6 +++---
 arch/powerpc/mm/pgtable_64.c |  6 +++---
 arch/powerpc/xmon/xmon.c |  6 +++---
 5 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 9d3743ca16d5..0d24fd984d16 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -497,7 +497,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t 
*pmd, bool full,
for (im = 0; im < PTRS_PER_PMD; ++im, ++p) {
if (!pmd_present(*p))
continue;
-   if (pmd_is_leaf(*p)) {
+   if (pmd_leaf(*p)) {
if (full) {
pmd_clear(p);
} else {
@@ -526,7 +526,7 @@ static void kvmppc_unmap_free_pud(struct kvm *kvm, pud_t 
*pud,
for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++p) {
if (!pud_present(*p))
continue;
-   if (pud_is_leaf(*p)) {
+   if (pud_leaf(*p)) {
pud_clear(p);
} else {
pmd_t *pmd;
@@ -629,12 +629,12 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pud = pud_alloc_one(kvm->mm, gpa);
 
pmd = NULL;
-   if (pud && pud_present(*pud) && !pud_is_leaf(*pud))
+   if (pud && pud_present(*pud) && !pud_leaf(*pud))
pmd = pmd_offset(pud, gpa);
else if (level <= 1)
new_pmd = kvmppc_pmd_alloc();
 
-   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
+   if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_leaf(*pmd)))
new_ptep = kvmppc_pte_alloc();
 
/* Check if we might have been invalidated; let the guest retry if so */
@@ -652,7 +652,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pud = NULL;
}
pud = pud_offset(p4d, gpa);
-   if (pud_is_leaf(*pud)) {
+   if (pud_leaf(*pud)) {
unsigned long hgpa = gpa & PUD_MASK;
 
/* Check if we raced and someone else has set the same thing */
@@ -703,7 +703,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
pte_t pte,
new_pmd = NULL;
}
pmd = pmd_offset(pud, gpa);
-   if (pmd_is_leaf(*pmd)) {
+   if (pmd_leaf(*pmd)) {
unsigned long lgpa = gpa & PMD_MASK;
 
/* Check if we raced and someone else has set the same thing */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 26245aaf12b8..4e46e001c3c3 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -205,14 +205,14 @@ static void radix__change_memory_range(unsigned long 
start, unsigned long end,
pudp = pud_alloc(_mm, p4dp, idx);
if (!pudp)
continue;
-   if (pud_is_leaf(*pudp)) {
+   if (pud_leaf(*pudp)) {
ptep = (pte_t *)pudp;
goto update_the_pte;
}
pmdp = pmd_alloc(_mm, pudp, idx);
if (!pmdp)
continue;
-   if (pmd_is_leaf(*pmdp)) {
+   if (pmd_leaf(*pmdp)) {
ptep = pmdp_ptep(pmdp);
goto update_the_pte;
}
@@ -786,7 +786,7 @@ static void __meminit remove_pmd_table(pmd_t *pmd_start, 
unsigned long addr,
if (!pmd_present(*pmd))
continue;
 
-   if (pmd_is_leaf(*pmd)) {
+   if (pmd_leaf(*pmd)) {
if (!IS_ALIGNED(addr, PMD_SIZE) ||
!IS_ALIGNED(next, PMD_SIZE)) {
WARN_ONCE(1, "%s: unaligned range\n", __func__);
@@ -816,7 +816,7 @@ static void __meminit remove_pud_table(pud_t *pud_start, 
unsigned long addr,
if (!pud_present(*pud))
continue;
 
-   if (pud_is_leaf(*pud)) {
+   if (pud_leaf(*pud)) {
if (!IS_ALIGNED(addr, PUD_SIZE) ||

[PATCH v8 2/7] powerpc/64s: mm: Introduce __pmdp_collapse_flush with mm_struct argument

2023-02-15 Thread Rohan McLure
pmdp_collapse_flush has references in generic code with just three
parameters, due to the choice of mm context being implied by the vm_area
context parameter.

Define __pmdp_collapse_flush to accept an additional mm_struct *
parameter, with pmdp_collapse_flush an inline function that unpacks
the vma and calls __pmdp_collapse_flush. The mm_struct * parameter
is needed in a future patch providing Page Table Check support,
which is defined in terms of mm context objects.

Signed-off-by: Rohan McLure 
---
v6: New patch
v7: Remove explicit `return' in macro. Prefix macro args with __
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index cb4c67bf45d7..7e0d546f4b3c 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1244,14 +1244,19 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
return hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
 }
 
-static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
-   unsigned long address, pmd_t *pmdp)
+static inline pmd_t __pmdp_collapse_flush(struct vm_area_struct *vma, struct 
mm_struct *mm,
+ unsigned long address, pmd_t *pmdp)
 {
if (radix_enabled())
return radix__pmdp_collapse_flush(vma, address, pmdp);
return hash__pmdp_collapse_flush(vma, address, pmdp);
 }
-#define pmdp_collapse_flush pmdp_collapse_flush
+#define pmdp_collapse_flush(__vma, __addr, __pmdp) \
+({ \
+   struct vm_area_struct *_vma = (__vma);  \
+   \
+   __pmdp_collapse_flush(_vma, _vma->vm_mm, (__addr), (__pmdp));   \
+})
 
 #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR_FULL
 pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma,
-- 
2.37.2



[PATCH v8 1/7] powerpc: mm: Separate set_pte, set_pte_at for internal, external use

2023-02-15 Thread Rohan McLure
Produce separate symbols for set_pte, which is to be used in
arch/powerpc for reassignment of pte's, and set_pte_at, used in generic
code.

The reason for this distinction is to support the Page Table Check
sanitiser. Having this distinction allows for set_pte_at to
instrumented, but set_pte not to be, permitting for uninstrumented
internal mappings. This distinction in names is also present in x86.

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v6: new patch
v7: Remove extern, move set_pte args to be in a single line.
---
 arch/powerpc/include/asm/book3s/pgtable.h | 3 +--
 arch/powerpc/include/asm/nohash/pgtable.h | 3 +--
 arch/powerpc/include/asm/pgtable.h| 1 +
 arch/powerpc/mm/pgtable.c | 3 +--
 4 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/pgtable.h 
b/arch/powerpc/include/asm/book3s/pgtable.h
index d18b748ea3ae..1386ed705e66 100644
--- a/arch/powerpc/include/asm/book3s/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/pgtable.h
@@ -12,8 +12,7 @@
 /* Insert a PTE, top-level function is out of line. It uses an inline
  * low level function in the respective pgtable-* files
  */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-  pte_t pte);
+void set_pte(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte);
 
 
 #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 69c3a050a3d8..f36dd2e2d591 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -154,8 +154,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 /* Insert a PTE, top-level function is out of line. It uses an inline
  * low level function in the respective pgtable-* files
  */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-  pte_t pte);
+void set_pte(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte);
 
 /* This low level function performs the actual PTE insertion
  * Setting the PTE depends on the MMU type and other factors. It's
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 9972626ddaf6..17d30359d1f4 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -48,6 +48,7 @@ struct mm_struct;
 /* Keep these as a macros to avoid include dependency mess */
 #define pte_page(x)pfn_to_page(pte_pfn(x))
 #define mk_pte(page, pgprot)   pfn_pte(page_to_pfn(page), (pgprot))
+#define set_pte_at set_pte
 /*
  * Select all bits except the pfn
  */
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cb2dcdb18f8e..d7cce317cef8 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -187,8 +187,7 @@ static pte_t set_access_flags_filter(pte_t pte, struct 
vm_area_struct *vma,
 /*
  * set_pte stores a linux PTE into the linux page table.
  */
-void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-   pte_t pte)
+void set_pte(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
 {
/*
 * Make sure hardware valid bit is not set. We don't do
-- 
2.37.2



[PATCH v8 0/7] Support page table check

2023-02-15 Thread Rohan McLure
Support the page table check sanitiser on all PowerPC platforms. This
sanitiser works by serialising assignments, reassignments and clears of
page table entries at each level in order to ensure that anonymous
mappings have at most one writable consumer, and likewise that
file-backed mappings are not simultaneously also anonymous mappings.

In order to support this infrastructure, a number of stubs must be
defined for all powerpc platforms. Additionally, seperate set_pte_at
and set_pte, to allow for internal, uninstrumented mappings.

v8:
 * Fix linux/page_table_check.h include in asm/pgtable.h breaking
   32-bit.

v7:
 * Remove use of extern in set_pte prototypes
 * Clean up pmdp_collapse_flush macro
 * Replace set_pte_at with static inline function
 * Fix commit message for patch 7
Link: 
https://lore.kernel.org/linuxppc-dev/20230215020155.1969194-1-rmcl...@linux.ibm.com/

v6:
 * Support huge pages and p{m,u}d accounting.
 * Remove instrumentation from set_pte from kernel internal pages.
 * 64s: Implement pmdp_collapse_flush in terms of __pmdp_collapse_flush
   as access to the mm_struct * is required.
Link: 
https://lore.kernel.org/linuxppc-dev/20230214015939.1853438-1-rmcl...@linux.ibm.com/

v5:
Link: 
https://lore.kernel.org/linuxppc-dev/20221118002146.25979-1-rmcl...@linux.ibm.com/

Rohan McLure (7):
  powerpc: mm: Separate set_pte, set_pte_at for internal, external use
  powerpc/64s: mm: Introduce __pmdp_collapse_flush with mm_struct
argument
  powerpc: mm: Replace p{u,m,4}d_is_leaf with p{u,m,4}_leaf
  powerpc: mm: Implement p{m,u,4}d_leaf on all platforms
  powerpc: mm: Add common pud_pfn stub for all platforms
  powerpc: mm: Add p{te,md,ud}_user_accessible_page helpers
  powerpc: mm: Support page table check

 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h | 17 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 85 +---
 arch/powerpc/include/asm/book3s/pgtable.h|  3 +-
 arch/powerpc/include/asm/nohash/32/pgtable.h | 12 ++-
 arch/powerpc/include/asm/nohash/64/pgtable.h | 24 +-
 arch/powerpc/include/asm/nohash/pgtable.h|  9 ++-
 arch/powerpc/include/asm/pgtable.h   | 60 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 +--
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  2 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 16 ++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 24 +++---
 arch/powerpc/mm/nohash/book3e_pgtable.c  |  2 +-
 arch/powerpc/mm/pgtable.c|  9 +--
 arch/powerpc/mm/pgtable_32.c |  2 +-
 arch/powerpc/mm/pgtable_64.c |  6 +-
 arch/powerpc/xmon/xmon.c |  6 +-
 17 files changed, 197 insertions(+), 93 deletions(-)

-- 
2.37.2



RE: [PATCH v4 06/10] soc: fsl: cmp1: Add support for QMC

2023-02-15 Thread Leo Li


> -Original Message-
> From: Herve Codina 
> Sent: Thursday, January 26, 2023 2:32 AM
> To: Herve Codina ; Leo Li
> ; Rob Herring ; Krzysztof
> Kozlowski ; Liam Girdwood
> ; Mark Brown ; Christophe
> Leroy ; Michael Ellerman
> ; Nicholas Piggin ; Qiang Zhao
> ; Jaroslav Kysela ; Takashi Iwai
> ; Shengjiu Wang ; Xiubo Li
> ; Fabio Estevam ; Nicolin
> Chen 
> Cc: linuxppc-dev@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org;
> devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; alsa-devel@alsa-
> project.org; Thomas Petazzoni 
> Subject: [PATCH v4 06/10] soc: fsl: cmp1: Add support for QMC

Typo: cpm1

> 
> The QMC (QUICC Multichannel Controller) emulates up to 64 channels within
> one serial controller using the same TDM physical interface routed from the
> TSA.
> 
> It is available in some   PowerQUICC SoC such as the
> MPC885 or MPC866.
> 
> It is also available on some Quicc Engine SoCs.
> This current version support CPM1 SoCs only and some enhancement are
> needed to support Quicc Engine SoCs.
> 
> Signed-off-by: Herve Codina 

Otherwise looks good to me.

Acked-by: Li Yang 

> ---
>  drivers/soc/fsl/qe/Kconfig  |   12 +
>  drivers/soc/fsl/qe/Makefile |1 +
>  drivers/soc/fsl/qe/qmc.c| 1533
> +++
>  include/soc/fsl/qe/qmc.h|   71 ++
>  4 files changed, 1617 insertions(+)
>  create mode 100644 drivers/soc/fsl/qe/qmc.c  create mode 100644
> include/soc/fsl/qe/qmc.h
> 
> diff --git a/drivers/soc/fsl/qe/Kconfig b/drivers/soc/fsl/qe/Kconfig index
> 60ec11c9f4d9..25b218351ae3 100644
> --- a/drivers/soc/fsl/qe/Kconfig
> +++ b/drivers/soc/fsl/qe/Kconfig
> @@ -44,6 +44,18 @@ config CPM_TSA
> This option enables support for this
> controller
> 
> +config CPM_QMC
> + tristate "CPM QMC support"
> + depends on OF && HAS_IOMEM
> + depends on CPM1 || (PPC && COMPILE_TEST)
> + depends on CPM_TSA
> + help
> +   Freescale CPM QUICC Multichannel Controller
> +   (QMC)
> +
> +   This option enables support for this
> +   controller
> +
>  config QE_TDM
>   bool
>   default y if FSL_UCC_HDLC
> diff --git a/drivers/soc/fsl/qe/Makefile b/drivers/soc/fsl/qe/Makefile index
> 45c961acc81b..ec8506e13113 100644
> --- a/drivers/soc/fsl/qe/Makefile
> +++ b/drivers/soc/fsl/qe/Makefile
> @@ -5,6 +5,7 @@
>  obj-$(CONFIG_QUICC_ENGINE)+= qe.o qe_common.o qe_ic.o qe_io.o
>  obj-$(CONFIG_CPM)+= qe_common.o
>  obj-$(CONFIG_CPM_TSA)+= tsa.o
> +obj-$(CONFIG_CPM_QMC)+= qmc.o
>  obj-$(CONFIG_UCC)+= ucc.o
>  obj-$(CONFIG_UCC_SLOW)   += ucc_slow.o
>  obj-$(CONFIG_UCC_FAST)   += ucc_fast.o
> diff --git a/drivers/soc/fsl/qe/qmc.c b/drivers/soc/fsl/qe/qmc.c new file
> mode 100644 index ..cfa7207353e0
> --- /dev/null
> +++ b/drivers/soc/fsl/qe/qmc.c
> @@ -0,0 +1,1533 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * QMC driver
> + *
> + * Copyright 2022 CS GROUP France
> + *
> + * Author: Herve Codina   */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "tsa.h"
> +
> +/* SCC general mode register high (32 bits) */
> +#define SCC_GSMRL0x00
> +#define SCC_GSMRL_ENR(1 << 5)
> +#define SCC_GSMRL_ENT(1 << 4)
> +#define SCC_GSMRL_MODE_QMC   (0x0A << 0)
> +
> +/* SCC general mode register low (32 bits) */
> +#define SCC_GSMRH0x04
> +#define   SCC_GSMRH_CTSS (1 << 7)
> +#define   SCC_GSMRH_CDS  (1 << 8)
> +#define   SCC_GSMRH_CTSP (1 << 9)
> +#define   SCC_GSMRH_CDP  (1 << 10)
> +
> +/* SCC event register (16 bits) */
> +#define SCC_SCCE 0x10
> +#define   SCC_SCCE_IQOV  (1 << 3)
> +#define   SCC_SCCE_GINT  (1 << 2)
> +#define   SCC_SCCE_GUN   (1 << 1)
> +#define   SCC_SCCE_GOV   (1 << 0)
> +
> +/* SCC mask register (16 bits) */
> +#define SCC_SCCM 0x14
> +/* Multichannel base pointer (32 bits) */
> +#define QMC_GBL_MCBASE   0x00
> +/* Multichannel controller state (16 bits) */
> +#define QMC_GBL_QMCSTATE 0x04
> +/* Maximum receive buffer length (16 bits) */
> +#define QMC_GBL_MRBLR0x06
> +/* Tx time-slot assignment table pointer (16 bits) */
> +#define QMC_GBL_TX_S_PTR 0x08
> +/* Rx pointer (16 bits) */
> +#define QMC_GBL_RXPTR0x0A
> +/* Global receive frame threshold (16 bits) */
> +#define QMC_GBL_GRFTHR   0x0C
> +/* Global receive frame count (16 bits) */
> +#define QMC_GBL_GRFCNT   0x0E
> +/* Multichannel interrupt base address (32 bits) */
> +#define QMC_GBL_INTBASE  0x10
> +/* Multichannel interrupt pointer (32 bits) */
> +#define QMC_GBL_INTPTR   0x14
> +/* Rx time-slot assignment table pointer (16 bits) */
> +#define QMC_GBL_RX_S_PTR 0x18
> +/* Tx pointer (16 bits) */
> +#define QMC_GBL_TXPTR0x1A
> +/* CRC 

Re: [PATCH v2 00/24] cpu,sched: Mark arch_cpu_idle_dead() __noreturn

2023-02-15 Thread Paul E. McKenney
On Mon, Feb 13, 2023 at 11:05:34PM -0800, Josh Poimboeuf wrote:
> v2:
> - make arch_call_rest_init() and rest_init() __noreturn
> - make objtool 'global_returns' work for weak functions
> - rebase on tip/objtool/core with dependencies merged in (mingo)
> - add acks
> 
> v1.1:
> - add __noreturn to all arch_cpu_idle_dead() implementations (mpe)

With this, rcutorture no longer gets objtool complaints on x86, thank you!

Tested-by: Paul E. McKenney 

> Josh Poimboeuf (24):
>   alpha/cpu: Expose arch_cpu_idle_dead()'s prototype declaration
>   alpha/cpu: Make sure arch_cpu_idle_dead() doesn't return
>   arm/cpu: Make sure arch_cpu_idle_dead() doesn't return
>   arm64/cpu: Mark cpu_die() __noreturn
>   csky/cpu: Make sure arch_cpu_idle_dead() doesn't return
>   ia64/cpu: Mark play_dead() __noreturn
>   loongarch/cpu: Make sure play_dead() doesn't return
>   loongarch/cpu: Mark play_dead() __noreturn
>   mips/cpu: Expose play_dead()'s prototype definition
>   mips/cpu: Make sure play_dead() doesn't return
>   mips/cpu: Mark play_dead() __noreturn
>   powerpc/cpu: Mark start_secondary_resume() __noreturn
>   sh/cpu: Make sure play_dead() doesn't return
>   sh/cpu: Mark play_dead() __noreturn
>   sh/cpu: Expose arch_cpu_idle_dead()'s prototype definition
>   sparc/cpu: Mark cpu_play_dead() __noreturn
>   x86/cpu: Make sure play_dead() doesn't return
>   x86/cpu: Mark play_dead() __noreturn
>   xtensa/cpu: Make sure cpu_die() doesn't return
>   xtensa/cpu: Mark cpu_die() __noreturn
>   sched/idle: Make sure weak version of arch_cpu_idle_dead() doesn't
> return
>   objtool: Include weak functions in 'global_noreturns' check
>   init: Make arch_call_rest_init() and rest_init() __noreturn
>   sched/idle: Mark arch_cpu_idle_dead() __noreturn
> 
>  arch/alpha/kernel/process.c  |  4 +++-
>  arch/arm/kernel/smp.c|  4 +++-
>  arch/arm64/include/asm/smp.h |  2 +-
>  arch/arm64/kernel/process.c  |  2 +-
>  arch/csky/kernel/smp.c   |  4 +++-
>  arch/ia64/kernel/process.c   |  6 +++---
>  arch/loongarch/include/asm/smp.h |  2 +-
>  arch/loongarch/kernel/process.c  |  2 +-
>  arch/loongarch/kernel/smp.c  |  2 +-
>  arch/mips/include/asm/smp.h  |  2 +-
>  arch/mips/kernel/process.c   |  2 +-
>  arch/mips/kernel/smp-bmips.c |  3 +++
>  arch/mips/loongson64/smp.c   |  1 +
>  arch/parisc/kernel/process.c |  2 +-
>  arch/powerpc/include/asm/smp.h   |  2 +-
>  arch/powerpc/kernel/smp.c|  2 +-
>  arch/riscv/kernel/cpu-hotplug.c  |  2 +-
>  arch/s390/kernel/idle.c  |  2 +-
>  arch/s390/kernel/setup.c |  2 +-
>  arch/sh/include/asm/smp-ops.h|  5 +++--
>  arch/sh/kernel/idle.c|  3 ++-
>  arch/sparc/include/asm/smp_64.h  |  2 +-
>  arch/sparc/kernel/process_64.c   |  2 +-
>  arch/x86/include/asm/smp.h   |  3 ++-
>  arch/x86/kernel/process.c|  4 ++--
>  arch/xtensa/include/asm/smp.h|  2 +-
>  arch/xtensa/kernel/smp.c |  4 +++-
>  include/linux/cpu.h  |  2 +-
>  include/linux/start_kernel.h |  4 ++--
>  init/main.c  |  4 ++--
>  kernel/sched/idle.c  |  2 +-
>  tools/objtool/check.c| 11 +++
>  32 files changed, 57 insertions(+), 39 deletions(-)
> 
> -- 
> 2.39.1
> 


Re: [PATCH v1 3/9] powerpc/47x: Split ppc47x machine in two

2023-02-15 Thread Michael Ellerman
Christophe Leroy  writes:
> This machine matches two compatibles and sets .pci_irq_fixup
> on one of them.
>
> Split it into two machines, then the probe function can be dropped.

That will break the machine initcalls that look for ppc47x:

  arch/powerpc/platforms/44x/ppc476.c:machine_device_initcall(ppc47x, 
ppc47x_device_probe);
  arch/powerpc/platforms/44x/ppc476.c:machine_arch_initcall(ppc47x, 
ppc47x_get_board_rev);

It still compiles, because of the weak attribute in machine_is(), but
those initcalls will never match and so never run.

cheers


RE: [PATCH v4 06/10] soc: fsl: cmp1: Add support for QMC

2023-02-15 Thread Leo Li


> -Original Message-
> From: Christophe Leroy 
> Sent: Wednesday, February 15, 2023 10:08 AM
> To: Leo Li ; Qiang Zhao 
> Cc: linuxppc-dev@lists.ozlabs.org; Krzysztof Kozlowski
> ; Rob Herring ;
> Herve Codina ; linux-arm-
> ker...@lists.infradead.org; devicet...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Nicholas Piggin ; Fabio
> Estevam ; Xiubo Li ;
> Shengjiu Wang ; Takashi Iwai
> ; Jaroslav Kysela ; Michael Ellerman
> ; Mark Brown ; Liam Girdwood
> ; alsa-de...@alsa-project.org; Thomas Petazzoni
> ; Nicolin Chen 
> Subject: Re: [PATCH v4 06/10] soc: fsl: cmp1: Add support for QMC
> 
> Hi Li and Qiang
> 
> Le 26/01/2023 à 09:32, Herve Codina a écrit :
> > The QMC (QUICC Multichannel Controller) emulates up to 64 channels
> > within one serial controller using the same TDM physical interface
> > routed from the TSA.
> >
> > It is available in some PowerQUICC SoC such as the
> > MPC885 or MPC866.
> >
> > It is also available on some Quicc Engine SoCs.
> > This current version support CPM1 SoCs only and some enhancement are
> > needed to support Quicc Engine SoCs.
> 
> Do you have any comment on this patch ?
> 
> Otherwise, may I ask you to send your Acked-by: so that the series can be
> merged in a relevant tree, most likely sound tree ?

Sure.  I will give it a review.

> 
> Thanks
> Christophe
> 
> >
> > Signed-off-by: Herve Codina 
> > ---
> >   drivers/soc/fsl/qe/Kconfig  |   12 +
> >   drivers/soc/fsl/qe/Makefile |1 +
> >   drivers/soc/fsl/qe/qmc.c| 1533
> +++
> >   include/soc/fsl/qe/qmc.h|   71 ++
> >   4 files changed, 1617 insertions(+)
> >   create mode 100644 drivers/soc/fsl/qe/qmc.c
> >   create mode 100644 include/soc/fsl/qe/qmc.h
> >
> > diff --git a/drivers/soc/fsl/qe/Kconfig b/drivers/soc/fsl/qe/Kconfig
> > index 60ec11c9f4d9..25b218351ae3 100644
> > --- a/drivers/soc/fsl/qe/Kconfig
> > +++ b/drivers/soc/fsl/qe/Kconfig
> > @@ -44,6 +44,18 @@ config CPM_TSA
> >   This option enables support for this
> >   controller
> >
> > +config CPM_QMC
> > +   tristate "CPM QMC support"
> > +   depends on OF && HAS_IOMEM
> > +   depends on CPM1 || (PPC && COMPILE_TEST)
> > +   depends on CPM_TSA
> > +   help
> > + Freescale CPM QUICC Multichannel Controller
> > + (QMC)
> > +
> > + This option enables support for this
> > + controller
> > +
> >   config QE_TDM
> > bool
> > default y if FSL_UCC_HDLC
> > diff --git a/drivers/soc/fsl/qe/Makefile b/drivers/soc/fsl/qe/Makefile
> > index 45c961acc81b..ec8506e13113 100644
> > --- a/drivers/soc/fsl/qe/Makefile
> > +++ b/drivers/soc/fsl/qe/Makefile
> > @@ -5,6 +5,7 @@
> >   obj-$(CONFIG_QUICC_ENGINE)+= qe.o qe_common.o qe_ic.o qe_io.o
> >   obj-$(CONFIG_CPM) += qe_common.o
> >   obj-$(CONFIG_CPM_TSA) += tsa.o
> > +obj-$(CONFIG_CPM_QMC)  += qmc.o
> >   obj-$(CONFIG_UCC) += ucc.o
> >   obj-$(CONFIG_UCC_SLOW)+= ucc_slow.o
> >   obj-$(CONFIG_UCC_FAST)+= ucc_fast.o
> > diff --git a/drivers/soc/fsl/qe/qmc.c b/drivers/soc/fsl/qe/qmc.c new
> > file mode 100644 index ..cfa7207353e0
> > --- /dev/null
> > +++ b/drivers/soc/fsl/qe/qmc.c
> > @@ -0,0 +1,1533 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * QMC driver
> > + *
> > + * Copyright 2022 CS GROUP France
> > + *
> > + * Author: Herve Codina   */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include "tsa.h"
> > +
> > +/* SCC general mode register high (32 bits) */
> > +#define SCC_GSMRL  0x00
> > +#define SCC_GSMRL_ENR  (1 << 5)
> > +#define SCC_GSMRL_ENT  (1 << 4)
> > +#define SCC_GSMRL_MODE_QMC (0x0A << 0)
> > +
> > +/* SCC general mode register low (32 bits) */
> > +#define SCC_GSMRH  0x04
> > +#define   SCC_GSMRH_CTSS   (1 << 7)
> > +#define   SCC_GSMRH_CDS(1 << 8)
> > +#define   SCC_GSMRH_CTSP   (1 << 9)
> > +#define   SCC_GSMRH_CDP(1 << 10)
> > +
> > +/* SCC event register (16 bits) */
> > +#define SCC_SCCE   0x10
> > +#define   SCC_SCCE_IQOV(1 << 3)
> > +#define   SCC_SCCE_GINT(1 << 2)
> > +#define   SCC_SCCE_GUN (1 << 1)
> > +#define   SCC_SCCE_GOV (1 << 0)
> > +
> > +/* SCC mask register (16 bits) */
> > +#define SCC_SCCM   0x14
> > +/* Multichannel base pointer (32 bits) */
> > +#define QMC_GBL_MCBASE 0x00
> > +/* Multichannel controller state (16 bits) */
> > +#define QMC_GBL_QMCSTATE   0x04
> > +/* Maximum receive buffer length (16 bits) */
> > +#define QMC_GBL_MRBLR  0x06
> > +/* Tx time-slot assignment table pointer (16 bits) */
> > +#define QMC_GBL_TX_S_PTR   0x08
> > +/* Rx pointer (16 bits) */
> > +#define QMC_GBL_RXPTR  0x0A
> > +/* Global receive frame threshold (16 bits) */
> > +#define QMC_GBL_GRFTHR 0x0C
> > +/* Global receive frame count (16 bits) */
> > +#define QMC_GBL_GRFCNT 

Re: [PATCH v4 00/18] gpiolib cleanups

2023-02-15 Thread Andy Shevchenko
On Wed, Feb 15, 2023 at 04:52:29PM +0100, Bartosz Golaszewski wrote:
> On Wed, Feb 8, 2023 at 6:34 PM Andy Shevchenko
>  wrote:
> >
> > These are some older patches Arnd did last year, rebased to
> > linux-next-20230208. On top there are Andy's patches regarding
> > similar topic. The series starts with Linus Walleij's patches.
> >
> > The main goal is to remove some of the legacy bits of the gpiolib
> > interfaces, where the corner cases are easily avoided or replaced
> > with gpio descriptor based interfaces.
> >
> > The idea is to get an immutable branch and route the whole series
> > via GPIO tree.
> 
> Andy,
> 
> looks like this series has all the acks it needs but I decided to not
> send it in the upcoming merge window, I'd prefer it gets some time in
> next so I'll let it sit until the next release cycle.

Ah, I forgot to mention that this is for the next cycle (v6.4).
Hence it's fine. (Moreover it's based on Linux Next, so it will
fail compilation in any certain tree except that one.)

I will create an immutable branch after v6.3-rc1 is out.

-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH v4 00/18] gpiolib cleanups

2023-02-15 Thread Bartosz Golaszewski
On Wed, Feb 8, 2023 at 6:34 PM Andy Shevchenko
 wrote:
>
> These are some older patches Arnd did last year, rebased to
> linux-next-20230208. On top there are Andy's patches regarding
> similar topic. The series starts with Linus Walleij's patches.
>
> The main goal is to remove some of the legacy bits of the gpiolib
> interfaces, where the corner cases are easily avoided or replaced
> with gpio descriptor based interfaces.
>
> The idea is to get an immutable branch and route the whole series
> via GPIO tree.
>

Andy,

looks like this series has all the acks it needs but I decided to not
send it in the upcoming merge window, I'd prefer it gets some time in
next so I'll let it sit until the next release cycle.

Bart


Re: [PATCH v4 01/15] spi: Replace all spi->chip_select and spi->cs_gpiod references with function call

2023-02-15 Thread Mark Brown
On Sat, Feb 11, 2023 at 01:06:32AM +0530, Amit Kumar Mahapatra wrote:
> Supporting multi-cs in spi drivers would require the chip_select & cs_gpiod
> members of struct spi_device to be an array. But changing the type of these
> members to array would break the spi driver functionality. To make the
> transition smoother introduced four new APIs to get/set the
> spi->chip_select & spi->cs_gpiod and replaced all spi->chip_select and

This again doesn't apply against my current code - I think the
best thing to do here is going to be to rebase against -rc1 when
it comes out and resend then, that will also make the issues
integrating with other trees easier as then I can make a clean
branch against -rc1 that other trees will be able to merge as
needed.


signature.asc
Description: PGP signature


[PATCH] usb: fix some spelling mistakes in comment

2023-02-15 Thread Zhou nan
Fix typos in comment.

Signed-off-by: Zhou nan 
---
 drivers/usb/gadget/udc/fsl_udc_core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c 
b/drivers/usb/gadget/udc/fsl_udc_core.c
index a67873a074b7..da876d09fc01 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -471,7 +471,7 @@ static int dr_ep_get_stall(unsigned char ep_num, unsigned 
char dir)
 /
 
 /*--
-* struct_ep_qh_setup(): set the Endpoint Capabilites field of QH
+* struct_ep_qh_setup(): set the Endpoint Capabilities field of QH
  * @zlt: Zero Length Termination Select (1: disable; 0: enable)
  * @mult: Mult field
  --*/
@@ -483,7 +483,7 @@ static void struct_ep_qh_setup(struct fsl_udc *udc, 
unsigned char ep_num,
struct ep_queue_head *p_QH = >ep_qh[2 * ep_num + dir];
unsigned int tmp = 0;
 
-   /* set the Endpoint Capabilites in QH */
+   /* set the Endpoint Capabilities in QH */
switch (ep_type) {
case USB_ENDPOINT_XFER_CONTROL:
/* Interrupt On Setup (IOS). for control ep  */
@@ -593,7 +593,7 @@ static int fsl_ep_enable(struct usb_ep *_ep,
ep->stopped = 0;
 
/* Controller related setup */
-   /* Init EPx Queue Head (Ep Capabilites field in QH
+   /* Init EPx Queue Head (Ep Capabilities field in QH
 * according to max, zlt, mult) */
struct_ep_qh_setup(udc, (unsigned char) ep_index(ep),
(unsigned char) ((desc->bEndpointAddress & USB_DIR_IN)
@@ -1361,7 +1361,7 @@ static void ch9getstatus(struct fsl_udc *udc, u8 
request_type, u16 value,
udc->ep0_dir = USB_DIR_IN;
/* Borrow the per device status_req */
req = udc->status_req;
-   /* Fill in the reqest structure */
+   /* Fill in the request structure */
*((u16 *) req->req.buf) = cpu_to_le16(tmp);
 
req->ep = ep;
-- 
2.27.0



[PATCH AUTOSEL 6.1 15/24] powerpc: Don't select ARCH_WANTS_NO_INSTR

2023-02-15 Thread Sasha Levin
From: Michael Ellerman 

[ Upstream commit e33416fca8a2313b8650bd5807aaf34354d39a4c ]

Commit 41b7a347bf14 ("powerpc: Book3S 64-bit outline-only KASAN
support") added a select of ARCH_WANTS_NO_INSTR, because it also added
some uses of noinstr. However noinstr is always defined, regardless of
ARCH_WANTS_NO_INSTR, so there's no need to select it just for that.

As PeterZ says [1]:
  Note that by selecting ARCH_WANTS_NO_INSTR you effectively state to
  abide by its rules.

As of now the powerpc code does not abide by those rules, and trips some
new warnings added by Peter in linux-next.

So until the code can be fixed to avoid those warnings, disable
ARCH_WANTS_NO_INSTR.

Note that ARCH_WANTS_NO_INSTR is also used to gate building KCOV and
parts of KCSAN. However none of the noinstr annotations in powerpc were
added for KCOV or KCSAN, instead instrumentation is blocked at the file
level using KCOV_INSTRUMENT_foo.o := n.

[1]: 
https://lore.kernel.org/linuxppc-dev/y9t6yoafro5yq...@hirez.programming.kicks-ass.net

Reported-by: Sachin Sant 
Suggested-by: Peter Zijlstra 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2ca5418457ed2..2b1141645d9e1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -161,7 +161,6 @@ config PPC
select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
select ARCH_WANT_LD_ORPHAN_WARN
select ARCH_WANTS_MODULES_DATA_IN_VMALLOC   if PPC_BOOK3S_32 || 
PPC_8xx
-   select ARCH_WANTS_NO_INSTR
select ARCH_WEAK_RELEASE_ACQUIRE
select BINFMT_ELF
select BUILDTIME_TABLE_SORT
-- 
2.39.0



Re: [PATCH v2 04/24] arm64/cpu: Mark cpu_die() __noreturn

2023-02-15 Thread Josh Poimboeuf
On Wed, Feb 15, 2023 at 01:09:21PM +, Mark Rutland wrote:
> On Tue, Feb 14, 2023 at 09:13:08AM +0100, Philippe Mathieu-Daudé wrote:
> > On 14/2/23 08:05, Josh Poimboeuf wrote:
> > > cpu_die() doesn't return.  Annotate it as such.  By extension this also
> > > makes arch_cpu_idle_dead() noreturn.
> > > 
> > > Signed-off-by: Josh Poimboeuf 
> > > ---
> > >   arch/arm64/include/asm/smp.h | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> > > index fc55f5a57a06..5733a31bab08 100644
> > > --- a/arch/arm64/include/asm/smp.h
> > > +++ b/arch/arm64/include/asm/smp.h
> > > @@ -100,7 +100,7 @@ static inline void arch_send_wakeup_ipi_mask(const 
> > > struct cpumask *mask)
> > >   extern int __cpu_disable(void);
> > >   extern void __cpu_die(unsigned int cpu);
> > > -extern void cpu_die(void);
> > > +extern void __noreturn cpu_die(void);
> > >   extern void cpu_die_early(void);
> > 
> > Shouldn't cpu_operations::cpu_die() be declared noreturn first?
> 
> The cpu_die() function ends with a BUG(), and so does not return, even if a
> cpu_operations::cpu_die() function that it calls erroneously returned.
> 
> We *could* mark cpu_operations::cpu_die() as noreturn, but I'd prefer that we
> did not so that the compiler doesn't optimize away the BUG() which is there to
> catch such erroneous returns.
> 
> That said, could we please add __noreturn to the implementation of cpu_die() 
> in
> arch/arm64/kernel/smp.c? i.e. the fixup below.

Done.

> With that fixup:
> 
> Acked-by: Mark Rutland 

Thanks!

-- 
Josh


Re: [PATCH v2 0/8] powerpc/85xx: p2020: Create one unified machine description

2023-02-15 Thread Pali Rohár
On Tuesday 14 February 2023 05:47:08 Christophe Leroy wrote:
> Le 13/02/2023 à 21:11, Pali Rohár a écrit :
> > On Monday 13 February 2023 19:58:15 Christophe Leroy wrote:
> >> Le 09/02/2023 à 01:15, Pali Rohár a écrit :
> 
>  This patch moves all p2020 boards from mpc85xx_rdb.c and mpc85xx_ds.c
>  files into new p2020.c file, and plus it copies all helper functions
>  which p2020 boards requires. This patch does not introduce any new code
>  or functional change. It should be really plain copy/move.
> >>
> >> Yes after looking into it in more details, it is exactly that. You
> >> copied all helper functions but this is not said in the commit message.
> >> I think it should be said, and more important it should be explained why.
> >> Because this is exactly what I was not understanding, why I couldn't see
> >> all moved functions: just because many of them were not moved but copied.
> >>
> >> In the two first pages you made some function static, and then you
> >> duplicated it. Why ? Why not keep it global and just use it from one
> >> place to the other ?
> >>
> >> Because after patch 3 we have:
> >>
> >> arch/powerpc/platforms/85xx/mpc85xx_rdb.c:static void __init
> >> mpc85xx_rdb_pic_init(void)
> >> arch/powerpc/platforms/85xx/p2020.c:static void __init
> >> mpc85xx_rdb_pic_init(void)
> >>
> >> arch/powerpc/platforms/85xx/mpc85xx_ds.c:static void __init
> >> mpc85xx_ds_pic_init(void)
> >> arch/powerpc/platforms/85xx/p2020.c:static void __init
> >> mpc85xx_ds_pic_init(void)
> >>
> >> Why not just drop patches 1 and 2 and keep those two functions and all
> >> the other common functions like mpc85xx_8259_cascade()
> >> mpc85xx_ds_uli_init() and a lot more  in a separate common file ?
> >>
> >> Christophe
> > 
> > After applying all patches there is no mpc85xx_rdb_pic_init() /
> > mpc85xx_ds_pic_init() function in p2020.c file. There is
> > p2020_pic_init() in p2020.c but it is slightly different than previous
> > two functions.
> 
> Ok, fair enough, but then please explain in the commit message that you 
> copy the functions and then they will be re-written in following 
> patches. That way we know exactly what we are reviewing.

But it is already explained in the commit message. Is not it enough? Or
should I rephrase some parts of the commit message?

> > 
> > Maybe it could be possible to create one function mpc85xx_pic_init() as
> > unification of previous 3 functions, but such change would be needed to
> > test on lot of mpc85xx boards, which would be affected by such change.
> > And I do not have them for testing. I have only P2020.
> 
> No, if the function are different it's better each platform has its own. 
> My comment was for functions that are exactly the same.
> 
> > 
> > So I wrote *_pic_init() function which is p2020 specific, like already
> > existing ds and rdb specific functions in their own source files.


Re: [PATCH v1 1/9] powerpc/machdep: Define 'compatible' property in ppc_md and use it

2023-02-15 Thread Nathan Lynch
Christophe Leroy  writes:

> Most probe functions do nothing else than checking whether
> the machine is compatible to a given string.
>
> Define that string in ppc_md structure and check it directly from
> probe_machine() instead of using ppc_md.probe() for that.
>
> Keep checking in ppc_md.probe() only for more complex probing.
>
> Signed-off-by: Christophe Leroy 
> ---
> v3: New
> ---
>  arch/powerpc/include/asm/machdep.h |  1 +
>  arch/powerpc/kernel/setup-common.c | 13 +++--
>  2 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/machdep.h 
> b/arch/powerpc/include/asm/machdep.h
> index 378b8d5836a7..c5dfe5ff923c 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -20,6 +20,7 @@ struct pci_host_bridge;
>  
>  struct machdep_calls {
>   char*name;
> + char*compatible;

Seems like 'compatible' ought to be const char *? Possibly 'name' also.
I'm able to build ppc64le with those changes, at least.


[PATCH 2/3] powerpc: Fix use of '-mabi=elfv2' with clang

2023-02-15 Thread Nathan Chancellor
'-mabi=elfv2' is not added to clang's invocations when
CONFIG_PPC64_ELF_ABI_V2 is enabled, resulting in the generation of elfv1
code, as evidenced by the orphan section warnings/errors:

  ld.lld: error: vmlinux.a(arch/powerpc/kernel/prom_init.o):(.opd) is being 
placed in '.opd'
  ld.lld: error: vmlinux.a(init/main.o):(.opd) is being placed in '.opd'
  ld.lld: error: vmlinux.a(init/version.o):(.opd) is being placed in '.opd'

To resolve this, add '-mabi=elfv2' to CFLAGS with clang. This uncovers
an issue in the 32-bit vDSO:

  error: unknown target ABI 'elfv2'

The ELFv2 ABI cannot be used when building code for a 32-bit target. To
resolve this, just remove the '-mabi' flags from the assembler flags, as
it was only needed for preprocessing (the _CALL_ELF macro) but this was
cleaned up in commit 5b89492c03e5 ("powerpc: Finalise cleanup around ABI
use").

Tested-by: "Erhard F." 
---
 arch/powerpc/Makefile | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index dc4cbf0a5ca9..3f2dd930e3cd 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -123,14 +123,12 @@ endif
 endif
 
 CFLAGS-$(CONFIG_PPC64) := $(call cc-option,-mtraceback=no)
-ifndef CONFIG_CC_IS_CLANG
 ifdef CONFIG_PPC64_ELF_ABI_V2
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2,$(call 
cc-option,-mcall-aixdesc))
-AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2)
 else
+ifndef CONFIG_CC_IS_CLANG
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv1)
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcall-aixdesc)
-AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv1)
 endif
 endif
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call 
cc-option,-mminimal-toc))

-- 
2.39.2



[PATCH 3/3] powerpc: Allow CONFIG_PPC64_BIG_ENDIAN_ELF_ABI_V2 with ld.lld 15+

2023-02-15 Thread Nathan Chancellor
Commit 5017b4594672 ("powerpc/64: Option to build big-endian with ELFv2
ABI") restricted the ELFv2 ABI configuration such that it can only be
selected when linking with ld.bfd, due to lack of testing with LLVM.

ld.lld can link ELFv2 kernels without any issues; in fact, it is the
only ABI that ld.lld supports, as ELFv1 is not supported in ld.lld.

As this has not seen a ton of real world testing yet, be conservative
and only allow this option to be selected with the latest stable release
of LLVM (15.x) and newer.

While in the area, remove 'default n', as it is unnecessary to specify
it explicitly since all boolean/tristate configuration symbols default
to n.

Tested-by: "Erhard F." 
---
 arch/powerpc/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b8c4ac56bddc..f9f13029c98a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -603,8 +603,7 @@ config PPC64_BIG_ENDIAN_ELF_ABI_V2
bool "Build big-endian kernel using ELF ABI V2 (EXPERIMENTAL)"
depends on PPC64 && CPU_BIG_ENDIAN
depends on CC_HAS_ELFV2
-   depends on LD_IS_BFD && LD_VERSION >= 22400
-   default n
+   depends on LD_VERSION >= 22400 || LLD_VERSION >= 15
help
  This builds the kernel image using the "Power Architecture 64-Bit ELF
  V2 ABI Specification", which has a reduced stack overhead and faster

-- 
2.39.2



[PATCH 1/3] powerpc/boot: Only use '-mabi=elfv2' with CONFIG_PPC64_BOOT_WRAPPER

2023-02-15 Thread Nathan Chancellor
When CONFIG_PPC64_ELF_ABI_V2 is enabled with clang through
CONFIG_PPC64_BIG_ENDIAN_ELF_ABI_V2, building the powerpc boot wrapper
in 32-bit mode (i.e. with CONFIG_PPC64_BOOT_WRAPPER=n) fails with:

error: unknown target ABI 'elfv2'

The ABI cannot be changed with '-m32'; GCC silently accepts it but clang
errors out. Only provide '-mabi=elfv2' when CONFIG_PPC64_BOOT_WRAPPER is
enabled, which is the only way '-mabi=elfv2' will be useful.

Tested-by: "Erhard F." 
Signed-off-by: Nathan Chancellor 
---
 arch/powerpc/boot/Makefile | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index d32d95aea5d6..0d4a8e8bdcab 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -44,6 +44,9 @@ BOOTCFLAGS+= -m64 -mcpu=powerpc64le
 else
 BOOTCFLAGS += -m64 -mcpu=powerpc64
 endif
+ifdef CONFIG_PPC64_ELF_ABI_V2
+BOOTCFLAGS += $(call cc-option,-mabi=elfv2)
+endif
 else
 BOOTCFLAGS += -m32 -mcpu=powerpc
 endif
@@ -55,9 +58,6 @@ BOOTCFLAGS+= -mbig-endian
 else
 BOOTCFLAGS += -mlittle-endian
 endif
-ifdef CONFIG_PPC64_ELF_ABI_V2
-BOOTCFLAGS += $(call cc-option,-mabi=elfv2)
-endif
 
 BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -nostdinc
 

-- 
2.39.2



[PATCH 0/3] Allow CONFIG_PPC64_BIG_ENDIAN_ELF_ABI_V2 with ld.lld 15+

2023-02-15 Thread Nathan Chancellor
Currently, CONFIG_PPC64_BIG_ENDIAN_ELF_ABI_V2 is not selectable with
ld.lld because of an explicit dependency on GNU ld, due to lack of
testing with LLVM.

Erhard was kind enough to test this option on his hardware with LLVM 15,
which ran without any issues. This should not be too surprising, as
ld.lld does not have support for the ELFv1 ABI, only ELFv2, so it should
have decent support. With this series, big endian kernels can be built
with LLVM=1.

This has seen our basic set of powerpc configurations with clang-15,
clang-16, and clang-17 but I will never be opposed to more testing :)

The first two patches fix a couple of issues I noticed while build
testing and the final patch actually allows the option to be selected.

---
Nathan Chancellor (3):
  powerpc/boot: Only use '-mabi=elfv2' with CONFIG_PPC64_BOOT_WRAPPER
  powerpc: Fix use of '-mabi=elfv2' with clang
  powerpc: Allow CONFIG_PPC64_BIG_ENDIAN_ELF_ABI_V2 with ld.lld 15+

 arch/powerpc/Kconfig   | 3 +--
 arch/powerpc/Makefile  | 4 +---
 arch/powerpc/boot/Makefile | 6 +++---
 3 files changed, 5 insertions(+), 8 deletions(-)
---
base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4
change-id: 20230118-ppc64-elfv2-llvm-39edac67bf0a

Best regards,
-- 
Nathan Chancellor 



Re: [External] [PATCH v2 00/33] Per-VMA locks

2023-02-15 Thread Suren Baghdasaryan
On Wed, Feb 15, 2023 at 9:33 AM Punit Agrawal
 wrote:
>
> Suren Baghdasaryan  writes:
>
> > Previous version:
> > v1: https://lore.kernel.org/all/20230109205336.3665937-1-sur...@google.com/
> > RFC: https://lore.kernel.org/all/20220901173516.702122-1-sur...@google.com/
> >
> > LWN article describing the feature:
> > https://lwn.net/Articles/906852/
> >
> > Per-vma locks idea that was discussed during SPF [1] discussion at LSF/MM
> > last year [2], which concluded with suggestion that “a reader/writer
> > semaphore could be put into the VMA itself; that would have the effect of
> > using the VMA as a sort of range lock. There would still be contention at
> > the VMA level, but it would be an improvement.” This patchset implements
> > this suggested approach.
>
> I took the patches for a spin on a 2-socket 32 core (64 threads) system
> with Intel 8336C (Ice Lake) and 512GB of RAM.
>
> For the initial testing, "pft-threads" from the mm-tests suite[0] was
> used. The test mmaps a memory region (~100GB on the test system) and
> triggers access by a number of threads executing in parallel. For each
> degree of parallelism, the test is repeated 10 times to get a better
> feel for the behaviour. Below is an excerpt of the harmonic mean
> reported by 'compare_kernel' script[1] included with mm-tests.
>
> The first column is results for mm-unstable as of 2023-02-10, the second
> column is the patches posted here while the third column includes
> optimizations to reclaim some of the observed regression.
>
> From the results, there is a drop in page fault/second for low number of
> CPUs but good improvement with higher CPUs.
>
> 6.2.0-rc46.2.0-rc4
> 6.2.0-rc4
>  mm-unstable-20230210   pvl-v2
>pvl-v2+opt
>
> Hmean faults/cpu-1 898792.9338 (   0.00%)   894597.0474 *  -0.47%*   
> 895933.2782 *  -0.32%*
> Hmean faults/cpu-4 751903.9803 (   0.00%)   677764.2975 *  -9.86%*   
> 688643.8163 *  -8.41%*
> Hmean faults/cpu-7 612275.5663 (   0.00%)   565363.4137 *  -7.66%*   
> 597538.9396 *  -2.41%*
> Hmean faults/cpu-12434460.9074 (   0.00%)   410974.2708 *  -5.41%*   
> 452501.4290 *   4.15%*
> Hmean faults/cpu-21291475.5165 (   0.00%)   293936.8460 (   0.84%)   
> 308712.2434 *   5.91%*
> Hmean faults/cpu-30218021.3980 (   0.00%)   228265.0559 *   4.70%*   
> 241897.5225 *  10.95%*
> Hmean faults/cpu-48141798.5030 (   0.00%)   162322.5972 *  14.47%*   
> 166081.9459 *  17.13%*
> Hmean faults/cpu-79 90060.9577 (   0.00%)   107028.7779 *  18.84%*   
> 109810.4488 *  21.93%*
> Hmean faults/cpu-11064729.3561 (   0.00%)80597.7246 *  24.51%*
> 83134.0679 *  28.43%*
> Hmean faults/cpu-12855740.1334 (   0.00%)68395.4426 *  22.70%*
> 69248.2836 *  24.23%*
>
> Hmean faults/sec-1 898781.7694 (   0.00%)   894247.3174 *  -0.50%*   
> 894440.3118 *  -0.48%*
> Hmean faults/sec-42965588.9697 (   0.00%)  2683651.5664 *  -9.51%*  
> 2726450.9710 *  -8.06%*
> Hmean faults/sec-74144512.3996 (   0.00%)  3891644.2128 *  -6.10%*  
> 4099918.8601 (  -1.08%)
> Hmean faults/sec-12   4969513.6934 (   0.00%)  4829731.4355 *  -2.81%*  
> 5264682.7371 *   5.94%*
> Hmean faults/sec-21   5814379.4789 (   0.00%)  5941405.3116 *   2.18%*  
> 6263716.3903 *   7.73%*
> Hmean faults/sec-30   6153685.3709 (   0.00%)  6489311.6634 *   5.45%*  
> 6910843.5858 *  12.30%*
> Hmean faults/sec-48   6197953.1327 (   0.00%)  7216320.7727 *  16.43%*  
> 7412782.2927 *  19.60%*
> Hmean faults/sec-79   6167135.3738 (   0.00%)  7425927.1022 *  20.41%*  
> 7637042.2198 *  23.83%*
> Hmean faults/sec-110  6264768.2247 (   0.00%)  7813329.3863 *  24.72%*  
> 7984344.4005 *  27.45%*
> Hmean faults/sec-128  6460727.8216 (   0.00%)  7875664.8999 *  21.90%*  
> 8049910.3601 *  24.60%*

Thanks for summarizing the findings, Punit! So, looks like the latest
fixes I sent to you for testing (pvl-v2+opt) bring the regression down
quite a bit. faults/sec-4 case is still regressing but the rest look
quite good. I'll incorporate those fixes and post v3 shortly. Thanks!

>
> [0] https://github.com/gormanm/mmtests
> [1] https://github.com/gormanm/mmtests/blob/master/compare-kernels.sh


Re: [External] [PATCH v2 00/33] Per-VMA locks

2023-02-15 Thread Punit Agrawal
Suren Baghdasaryan  writes:

> Previous version:
> v1: https://lore.kernel.org/all/20230109205336.3665937-1-sur...@google.com/
> RFC: https://lore.kernel.org/all/20220901173516.702122-1-sur...@google.com/
>
> LWN article describing the feature:
> https://lwn.net/Articles/906852/
>
> Per-vma locks idea that was discussed during SPF [1] discussion at LSF/MM
> last year [2], which concluded with suggestion that “a reader/writer
> semaphore could be put into the VMA itself; that would have the effect of
> using the VMA as a sort of range lock. There would still be contention at
> the VMA level, but it would be an improvement.” This patchset implements
> this suggested approach.

I took the patches for a spin on a 2-socket 32 core (64 threads) system
with Intel 8336C (Ice Lake) and 512GB of RAM.

For the initial testing, "pft-threads" from the mm-tests suite[0] was
used. The test mmaps a memory region (~100GB on the test system) and
triggers access by a number of threads executing in parallel. For each
degree of parallelism, the test is repeated 10 times to get a better
feel for the behaviour. Below is an excerpt of the harmonic mean
reported by 'compare_kernel' script[1] included with mm-tests.

The first column is results for mm-unstable as of 2023-02-10, the second
column is the patches posted here while the third column includes
optimizations to reclaim some of the observed regression.

>From the results, there is a drop in page fault/second for low number of
CPUs but good improvement with higher CPUs.

6.2.0-rc46.2.0-rc4  
  6.2.0-rc4
 mm-unstable-20230210   pvl-v2  
 pvl-v2+opt

Hmean faults/cpu-1 898792.9338 (   0.00%)   894597.0474 *  -0.47%*   
895933.2782 *  -0.32%*
Hmean faults/cpu-4 751903.9803 (   0.00%)   677764.2975 *  -9.86%*   
688643.8163 *  -8.41%*
Hmean faults/cpu-7 612275.5663 (   0.00%)   565363.4137 *  -7.66%*   
597538.9396 *  -2.41%*
Hmean faults/cpu-12434460.9074 (   0.00%)   410974.2708 *  -5.41%*   
452501.4290 *   4.15%*
Hmean faults/cpu-21291475.5165 (   0.00%)   293936.8460 (   0.84%)   
308712.2434 *   5.91%*
Hmean faults/cpu-30218021.3980 (   0.00%)   228265.0559 *   4.70%*   
241897.5225 *  10.95%*
Hmean faults/cpu-48141798.5030 (   0.00%)   162322.5972 *  14.47%*   
166081.9459 *  17.13%*
Hmean faults/cpu-79 90060.9577 (   0.00%)   107028.7779 *  18.84%*   
109810.4488 *  21.93%*
Hmean faults/cpu-11064729.3561 (   0.00%)80597.7246 *  24.51%*
83134.0679 *  28.43%*
Hmean faults/cpu-12855740.1334 (   0.00%)68395.4426 *  22.70%*
69248.2836 *  24.23%*

Hmean faults/sec-1 898781.7694 (   0.00%)   894247.3174 *  -0.50%*   
894440.3118 *  -0.48%*
Hmean faults/sec-42965588.9697 (   0.00%)  2683651.5664 *  -9.51%*  
2726450.9710 *  -8.06%*
Hmean faults/sec-74144512.3996 (   0.00%)  3891644.2128 *  -6.10%*  
4099918.8601 (  -1.08%)
Hmean faults/sec-12   4969513.6934 (   0.00%)  4829731.4355 *  -2.81%*  
5264682.7371 *   5.94%*
Hmean faults/sec-21   5814379.4789 (   0.00%)  5941405.3116 *   2.18%*  
6263716.3903 *   7.73%*
Hmean faults/sec-30   6153685.3709 (   0.00%)  6489311.6634 *   5.45%*  
6910843.5858 *  12.30%*
Hmean faults/sec-48   6197953.1327 (   0.00%)  7216320.7727 *  16.43%*  
7412782.2927 *  19.60%*
Hmean faults/sec-79   6167135.3738 (   0.00%)  7425927.1022 *  20.41%*  
7637042.2198 *  23.83%*
Hmean faults/sec-110  6264768.2247 (   0.00%)  7813329.3863 *  24.72%*  
7984344.4005 *  27.45%*
Hmean faults/sec-128  6460727.8216 (   0.00%)  7875664.8999 *  21.90%*  
8049910.3601 *  24.60%*

[0] https://github.com/gormanm/mmtests
[1] https://github.com/gormanm/mmtests/blob/master/compare-kernels.sh


[PATCH] macintosh: windfarm: Use unsigned type for 1-bit bitfields

2023-02-15 Thread Nathan Chancellor
Clang warns:

  drivers/macintosh/windfarm_lm75_sensor.c:63:14: error: implicit truncation 
from 'int' to a one-bit wide bit-field changes value from 1 to -1 
[-Werror,-Wsingle-bit-bitfield-constant-conversion]
  lm->inited = 1;
 ^ ~

  drivers/macintosh/windfarm_smu_sensors.c:356:19: error: implicit truncation 
from 'int' to a one-bit wide bit-field changes value from 1 to -1 
[-Werror,-Wsingle-bit-bitfield-constant-conversion]
  pow->fake_volts = 1;
  ^ ~
  drivers/macintosh/windfarm_smu_sensors.c:368:18: error: implicit truncation 
from 'int' to a one-bit wide bit-field changes value from 1 to -1 
[-Werror,-Wsingle-bit-bitfield-constant-conversion]
  pow->quadratic = 1;
 ^ ~

There is no bug here since no code checks the actual value of these
fields, just whether or not they are zero (boolean context), but this
can be easily fixed by switching to an unsigned type.

Signed-off-by: Nathan Chancellor 
---
 drivers/macintosh/windfarm_lm75_sensor.c | 4 ++--
 drivers/macintosh/windfarm_smu_sensors.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/macintosh/windfarm_lm75_sensor.c 
b/drivers/macintosh/windfarm_lm75_sensor.c
index 24f0a444d312..9c6febce2376 100644
--- a/drivers/macintosh/windfarm_lm75_sensor.c
+++ b/drivers/macintosh/windfarm_lm75_sensor.c
@@ -33,8 +33,8 @@
 #endif
 
 struct wf_lm75_sensor {
-   int ds1775 : 1;
-   int inited : 1;
+   unsigned intds1775 : 1;
+   unsigned intinited : 1;
struct i2c_client   *i2c;
struct wf_sensorsens;
 };
diff --git a/drivers/macintosh/windfarm_smu_sensors.c 
b/drivers/macintosh/windfarm_smu_sensors.c
index 00c6fe25fcba..2bdb73b34d29 100644
--- a/drivers/macintosh/windfarm_smu_sensors.c
+++ b/drivers/macintosh/windfarm_smu_sensors.c
@@ -274,8 +274,8 @@ struct smu_cpu_power_sensor {
struct list_headlink;
struct wf_sensor*volts;
struct wf_sensor*amps;
-   int fake_volts : 1;
-   int quadratic : 1;
+   unsigned intfake_volts : 1;
+   unsigned intquadratic : 1;
struct wf_sensorsens;
 };
 #define to_smu_cpu_power(c) container_of(c, struct smu_cpu_power_sensor, sens)

---
base-commit: ceaa837f96adb69c0df0397937cd74991d5d821a
change-id: 
20230215-windfarm-wsingle-bit-bitfield-constant-conversion-324fed6e2342

Best regards,
-- 
Nathan Chancellor 



Re: [PATCH v4 02/10] soc: fsl: cpm1: Add support for TSA

2023-02-15 Thread Christophe Leroy
Hi Li and Qiang,

Le 26/01/2023 à 09:32, Herve Codina a écrit :
> The TSA (Time Slot Assigner) purpose is to route some
> TDM time-slots to other internal serial controllers.
> 
> It is available in some PowerQUICC SoC such as the
> MPC885 or MPC866.
> 
> It is also available on some Quicc Engine SoCs.
> This current version support CPM1 SoCs only and some
> enhancement are needed to support Quicc Engine SoCs.

Do you have any comment on this other patch ?

Otherwise, may I ask if you can send a Acked-by: so that the series can 
be merged in a relevant tree, most likely sound tree ?

Thanks
Christophe

> 
> Signed-off-by: Herve Codina 
> ---
>   drivers/soc/fsl/qe/Kconfig  |  11 +
>   drivers/soc/fsl/qe/Makefile |   1 +
>   drivers/soc/fsl/qe/tsa.c| 864 
>   drivers/soc/fsl/qe/tsa.h|  42 ++
>   4 files changed, 918 insertions(+)
>   create mode 100644 drivers/soc/fsl/qe/tsa.c
>   create mode 100644 drivers/soc/fsl/qe/tsa.h
> 
> diff --git a/drivers/soc/fsl/qe/Kconfig b/drivers/soc/fsl/qe/Kconfig
> index 357c5800b112..60ec11c9f4d9 100644
> --- a/drivers/soc/fsl/qe/Kconfig
> +++ b/drivers/soc/fsl/qe/Kconfig
> @@ -33,6 +33,17 @@ config UCC
>   bool
>   default y if UCC_FAST || UCC_SLOW
>   
> +config CPM_TSA
> + tristate "CPM TSA support"
> + depends on OF && HAS_IOMEM
> + depends on CPM1 || (PPC && COMPILE_TEST)
> + help
> +   Freescale CPM Time Slot Assigner (TSA)
> +   controller.
> +
> +   This option enables support for this
> +   controller
> +
>   config QE_TDM
>   bool
>   default y if FSL_UCC_HDLC
> diff --git a/drivers/soc/fsl/qe/Makefile b/drivers/soc/fsl/qe/Makefile
> index 55a555304f3a..45c961acc81b 100644
> --- a/drivers/soc/fsl/qe/Makefile
> +++ b/drivers/soc/fsl/qe/Makefile
> @@ -4,6 +4,7 @@
>   #
>   obj-$(CONFIG_QUICC_ENGINE)+= qe.o qe_common.o qe_ic.o qe_io.o
>   obj-$(CONFIG_CPM)   += qe_common.o
> +obj-$(CONFIG_CPM_TSA)+= tsa.o
>   obj-$(CONFIG_UCC)   += ucc.o
>   obj-$(CONFIG_UCC_SLOW)  += ucc_slow.o
>   obj-$(CONFIG_UCC_FAST)  += ucc_fast.o
> diff --git a/drivers/soc/fsl/qe/tsa.c b/drivers/soc/fsl/qe/tsa.c
> new file mode 100644
> index ..91b4c89fa5b3
> --- /dev/null
> +++ b/drivers/soc/fsl/qe/tsa.c
> @@ -0,0 +1,864 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * TSA driver
> + *
> + * Copyright 2022 CS GROUP France
> + *
> + * Author: Herve Codina 
> + */
> +
> +#include "tsa.h"
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +
> +/* TSA SI RAM routing tables entry */
> +#define TSA_SIRAM_ENTRY_LAST (1 << 16)
> +#define TSA_SIRAM_ENTRY_BYTE (1 << 17)
> +#define TSA_SIRAM_ENTRY_CNT(x)   (((x) & 0x0f) << 18)
> +#define TSA_SIRAM_ENTRY_CSEL_MASK(0x7 << 22)
> +#define TSA_SIRAM_ENTRY_CSEL_NU  (0x0 << 22)
> +#define TSA_SIRAM_ENTRY_CSEL_SCC2(0x2 << 22)
> +#define TSA_SIRAM_ENTRY_CSEL_SCC3(0x3 << 22)
> +#define TSA_SIRAM_ENTRY_CSEL_SCC4(0x4 << 22)
> +#define TSA_SIRAM_ENTRY_CSEL_SMC1(0x5 << 22)
> +#define TSA_SIRAM_ENTRY_CSEL_SMC2(0x6 << 22)
> +
> +/* SI mode register (32 bits) */
> +#define TSA_SIMODE   0x00
> +#define   TSA_SIMODE_SMC20x8000
> +#define   TSA_SIMODE_SMC10x8000
> +#define   TSA_SIMODE_TDMA(x) ((x) << 0)
> +#define   TSA_SIMODE_TDMB(x) ((x) << 16)
> +#define TSA_SIMODE_TDM_MASK  0x0fff
> +#define TSA_SIMODE_TDM_SDM_MASK  0x0c00
> +#define   TSA_SIMODE_TDM_SDM_NORM0x
> +#define   TSA_SIMODE_TDM_SDM_ECHO0x0400
> +#define   TSA_SIMODE_TDM_SDM_INTL_LOOP   0x0800
> +#define   TSA_SIMODE_TDM_SDM_LOOP_CTRL   0x0c00
> +#define TSA_SIMODE_TDM_RFSD(x)   ((x) << 8)
> +#define TSA_SIMODE_TDM_DSC   0x0080
> +#define TSA_SIMODE_TDM_CRT   0x0040
> +#define TSA_SIMODE_TDM_STZ   0x0020
> +#define TSA_SIMODE_TDM_CE0x0010
> +#define TSA_SIMODE_TDM_FE0x0008
> +#define TSA_SIMODE_TDM_GM0x0004
> +#define TSA_SIMODE_TDM_TFSD(x)   ((x) << 0)
> +
> +/* SI global mode register (8 bits) */
> +#define TSA_SIGMR0x04
> +#define TSA_SIGMR_ENB(1<<3)
> +#define TSA_SIGMR_ENA(1<<2)
> +#define TSA_SIGMR_RDM_MASK   0x03
> +#define   TSA_SIGMR_RDM_STATIC_TDMA  0x00
> +#define   TSA_SIGMR_RDM_DYN_TDMA 0x01
> +#define   TSA_SIGMR_RDM_STATIC_TDMAB 0x02
> +#define   TSA_SIGMR_RDM_DYN_TDMAB0x03
> +
> +/* SI status register (8 bits) */
> +#define TSA_SISTR0x06
> +
> +/* SI command register (8 bits) */
> +#define TSA_SICMR0x07
> +
> +/* SI clock route register (32 bits) */
> +#define TSA_SICR 0x0C
> +#define   TSA_SICR_SCC2(x)   ((x) << 8)
> 

Re: [PATCH v4 06/10] soc: fsl: cmp1: Add support for QMC

2023-02-15 Thread Christophe Leroy
Hi Li and Qiang

Le 26/01/2023 à 09:32, Herve Codina a écrit :
> The QMC (QUICC Multichannel Controller) emulates up to 64
> channels within one serial controller using the same TDM
> physical interface routed from the TSA.
> 
> It is available in some   PowerQUICC SoC such as the
> MPC885 or MPC866.
> 
> It is also available on some Quicc Engine SoCs.
> This current version support CPM1 SoCs only and some
> enhancement are needed to support Quicc Engine SoCs.

Do you have any comment on this patch ?

Otherwise, may I ask you to send your Acked-by: so that the series can 
be merged in a relevant tree, most likely sound tree ?

Thanks
Christophe

> 
> Signed-off-by: Herve Codina 
> ---
>   drivers/soc/fsl/qe/Kconfig  |   12 +
>   drivers/soc/fsl/qe/Makefile |1 +
>   drivers/soc/fsl/qe/qmc.c| 1533 +++
>   include/soc/fsl/qe/qmc.h|   71 ++
>   4 files changed, 1617 insertions(+)
>   create mode 100644 drivers/soc/fsl/qe/qmc.c
>   create mode 100644 include/soc/fsl/qe/qmc.h
> 
> diff --git a/drivers/soc/fsl/qe/Kconfig b/drivers/soc/fsl/qe/Kconfig
> index 60ec11c9f4d9..25b218351ae3 100644
> --- a/drivers/soc/fsl/qe/Kconfig
> +++ b/drivers/soc/fsl/qe/Kconfig
> @@ -44,6 +44,18 @@ config CPM_TSA
> This option enables support for this
> controller
>   
> +config CPM_QMC
> + tristate "CPM QMC support"
> + depends on OF && HAS_IOMEM
> + depends on CPM1 || (PPC && COMPILE_TEST)
> + depends on CPM_TSA
> + help
> +   Freescale CPM QUICC Multichannel Controller
> +   (QMC)
> +
> +   This option enables support for this
> +   controller
> +
>   config QE_TDM
>   bool
>   default y if FSL_UCC_HDLC
> diff --git a/drivers/soc/fsl/qe/Makefile b/drivers/soc/fsl/qe/Makefile
> index 45c961acc81b..ec8506e13113 100644
> --- a/drivers/soc/fsl/qe/Makefile
> +++ b/drivers/soc/fsl/qe/Makefile
> @@ -5,6 +5,7 @@
>   obj-$(CONFIG_QUICC_ENGINE)+= qe.o qe_common.o qe_ic.o qe_io.o
>   obj-$(CONFIG_CPM)   += qe_common.o
>   obj-$(CONFIG_CPM_TSA)   += tsa.o
> +obj-$(CONFIG_CPM_QMC)+= qmc.o
>   obj-$(CONFIG_UCC)   += ucc.o
>   obj-$(CONFIG_UCC_SLOW)  += ucc_slow.o
>   obj-$(CONFIG_UCC_FAST)  += ucc_fast.o
> diff --git a/drivers/soc/fsl/qe/qmc.c b/drivers/soc/fsl/qe/qmc.c
> new file mode 100644
> index ..cfa7207353e0
> --- /dev/null
> +++ b/drivers/soc/fsl/qe/qmc.c
> @@ -0,0 +1,1533 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * QMC driver
> + *
> + * Copyright 2022 CS GROUP France
> + *
> + * Author: Herve Codina 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "tsa.h"
> +
> +/* SCC general mode register high (32 bits) */
> +#define SCC_GSMRL0x00
> +#define SCC_GSMRL_ENR(1 << 5)
> +#define SCC_GSMRL_ENT(1 << 4)
> +#define SCC_GSMRL_MODE_QMC   (0x0A << 0)
> +
> +/* SCC general mode register low (32 bits) */
> +#define SCC_GSMRH0x04
> +#define   SCC_GSMRH_CTSS (1 << 7)
> +#define   SCC_GSMRH_CDS  (1 << 8)
> +#define   SCC_GSMRH_CTSP (1 << 9)
> +#define   SCC_GSMRH_CDP  (1 << 10)
> +
> +/* SCC event register (16 bits) */
> +#define SCC_SCCE 0x10
> +#define   SCC_SCCE_IQOV  (1 << 3)
> +#define   SCC_SCCE_GINT  (1 << 2)
> +#define   SCC_SCCE_GUN   (1 << 1)
> +#define   SCC_SCCE_GOV   (1 << 0)
> +
> +/* SCC mask register (16 bits) */
> +#define SCC_SCCM 0x14
> +/* Multichannel base pointer (32 bits) */
> +#define QMC_GBL_MCBASE   0x00
> +/* Multichannel controller state (16 bits) */
> +#define QMC_GBL_QMCSTATE 0x04
> +/* Maximum receive buffer length (16 bits) */
> +#define QMC_GBL_MRBLR0x06
> +/* Tx time-slot assignment table pointer (16 bits) */
> +#define QMC_GBL_TX_S_PTR 0x08
> +/* Rx pointer (16 bits) */
> +#define QMC_GBL_RXPTR0x0A
> +/* Global receive frame threshold (16 bits) */
> +#define QMC_GBL_GRFTHR   0x0C
> +/* Global receive frame count (16 bits) */
> +#define QMC_GBL_GRFCNT   0x0E
> +/* Multichannel interrupt base address (32 bits) */
> +#define QMC_GBL_INTBASE  0x10
> +/* Multichannel interrupt pointer (32 bits) */
> +#define QMC_GBL_INTPTR   0x14
> +/* Rx time-slot assignment table pointer (16 bits) */
> +#define QMC_GBL_RX_S_PTR 0x18
> +/* Tx pointer (16 bits) */
> +#define QMC_GBL_TXPTR0x1A
> +/* CRC constant (32 bits) */
> +#define QMC_GBL_C_MASK32 0x1C
> +/* Time slot assignment table Rx (32 x 16 bits) */
> +#define QMC_GBL_TSATRX   0x20
> +/* Time slot assignment table Tx (32 x 16 bits) */
> +#define QMC_GBL_TSATTX   0x60
> +/* CRC constant (16 bits) */
> +#define QMC_GBL_C_MASK16 0xA0
> +
> +/* TSA entry (16bit entry in TSATRX and TSATTX) */
> +#define QMC_TSA_VALID 

Re: [PATCH v2] tools/perf/tests: Change true workload to sleep workload in all metric test for system wide check

2023-02-15 Thread Ian Rogers
On Wed, Feb 15, 2023 at 1:38 AM Kajol Jain  wrote:
>
> Testcase stat_all_metrics.sh fails in powerpc:
>
> 98: perf all metrics test : FAILED!
>
> Logs with verbose:
>
> [command]# ./perf test 98 -vv
>  98: perf all metrics test   :
>  --- start ---
> test child forked, pid 13262
> Testing BRU_STALL_CPI
> Testing COMPLETION_STALL_CPI
>  
> Testing TOTAL_LOCAL_NODE_PUMPS_P23
> Metric 'TOTAL_LOCAL_NODE_PUMPS_P23' not printed in:
> Error:
> Invalid event (hv_24x7/PM_PB_LNS_PUMP23,chip=3/) in per-thread mode, enable 
> system wide with '-a'.
> Testing TOTAL_LOCAL_NODE_PUMPS_RETRIES_P01
> Metric 'TOTAL_LOCAL_NODE_PUMPS_RETRIES_P01' not printed in:
> Error:
> Invalid event (hv_24x7/PM_PB_RTY_LNS_PUMP01,chip=3/) in per-thread mode, 
> enable system wide with '-a'.
>  
>
> Based on above logs, we could see some of the hv-24x7 metric events fails,
> and logs suggest to run the metric event with -a option.
> This change happened after the commit a4b8cfcabb1d ("perf stat: Delay metric
> parsing"), which delayed the metric parsing phase and now before metric 
> parsing
> phase perf tool identifies, whether target is system-wide or not. With this
> change, perf_event_open will fails with workload monitoring for uncore events
> as expected.
>
> The perf all metric test case fails as some of the hv-24x7 metric events
> may need bigger workload with system wide monitoring to get the data.
> Fix this issue by changing current system wide check from true workload to
> sleep 0.01 workload.
>
> Result with the patch changes in powerpc:
>
> 98: perf all metrics test : Ok
>
> Reviewed-by: Athira Rajeev 
> Tested-by: Disha Goel 
> Suggested-by: Ian Rogers 
> Signed-off-by: Kajol Jain 

Tested-by: Ian Rogers 

The mention of a4b8cfcabb1d  can be moved to a Fixes tag so that this
is backported.

Thanks,
Ian

> ---
> Changelog:
>
> v1->v2:
> - Addressed review comments from Ian, by changing true workload
>   to sleep workload in "perf all metric test". Rather then adding
>   new system wide check with perf bench workload.
> - Added Reviewed-by, Tested-by and Suggested-by tags.
>
>  tools/perf/tests/shell/stat_all_metrics.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/tests/shell/stat_all_metrics.sh 
> b/tools/perf/tests/shell/stat_all_metrics.sh
> index 6e79349e42be..22e9cb294b40 100755
> --- a/tools/perf/tests/shell/stat_all_metrics.sh
> +++ b/tools/perf/tests/shell/stat_all_metrics.sh
> @@ -11,7 +11,7 @@ for m in $(perf list --raw-dump metrics); do
>  continue
>fi
># Failed so try system wide.
> -  result=$(perf stat -M "$m" -a true 2>&1)
> +  result=$(perf stat -M "$m" -a sleep 0.01 2>&1)
>if [[ "$result" =~ "${m:0:50}" ]]
>then
>  continue
> --
> 2.39.1
>


[PATCH v8 3/3] riscv: Check relocations at compile time

2023-02-15 Thread Alexandre Ghiti
From: Alexandre Ghiti 

Relocating kernel at runtime is done very early in the boot process, so
it is not convenient to check for relocations there and react in case a
relocation was not expected.

There exists a script in scripts/ that extracts the relocations from
vmlinux that is then used at postlink to check the relocations.

Signed-off-by: Alexandre Ghiti 
Reviewed-by: Anup Patel 
---
 arch/riscv/Makefile.postlink | 36 
 arch/riscv/tools/relocs_check.sh | 26 +++
 2 files changed, 62 insertions(+)
 create mode 100644 arch/riscv/Makefile.postlink
 create mode 100755 arch/riscv/tools/relocs_check.sh

diff --git a/arch/riscv/Makefile.postlink b/arch/riscv/Makefile.postlink
new file mode 100644
index ..bf2b2bca1845
--- /dev/null
+++ b/arch/riscv/Makefile.postlink
@@ -0,0 +1,36 @@
+# SPDX-License-Identifier: GPL-2.0
+# ===
+# Post-link riscv pass
+# ===
+#
+# Check that vmlinux relocations look sane
+
+PHONY := __archpost
+__archpost:
+
+-include include/config/auto.conf
+include scripts/Kbuild.include
+
+quiet_cmd_relocs_check = CHKREL  $@
+cmd_relocs_check = \
+   $(CONFIG_SHELL) $(srctree)/arch/riscv/tools/relocs_check.sh 
"$(OBJDUMP)" "$(NM)" "$@"
+
+# `@true` prevents complaint when there is nothing to be done
+
+vmlinux: FORCE
+   @true
+ifdef CONFIG_RELOCATABLE
+   $(call if_changed,relocs_check)
+endif
+
+%.ko: FORCE
+   @true
+
+clean:
+   @true
+
+PHONY += FORCE clean
+
+FORCE:
+
+.PHONY: $(PHONY)
diff --git a/arch/riscv/tools/relocs_check.sh b/arch/riscv/tools/relocs_check.sh
new file mode 100755
index ..baeb2e7b2290
--- /dev/null
+++ b/arch/riscv/tools/relocs_check.sh
@@ -0,0 +1,26 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Based on powerpc relocs_check.sh
+
+# This script checks the relocations of a vmlinux for "suspicious"
+# relocations.
+
+if [ $# -lt 3 ]; then
+echo "$0 [path to objdump] [path to nm] [path to vmlinux]" 1>&2
+exit 1
+fi
+
+bad_relocs=$(
+${srctree}/scripts/relocs_check.sh "$@" |
+   # These relocations are okay
+   #   R_RISCV_RELATIVE
+   grep -F -w -v 'R_RISCV_RELATIVE'
+)
+
+if [ -z "$bad_relocs" ]; then
+   exit 0
+fi
+
+num_bad=$(echo "$bad_relocs" | wc -l)
+echo "WARNING: $num_bad bad relocations"
+echo "$bad_relocs"
-- 
2.37.2



[PATCH v8 2/3] powerpc: Move script to check relocations at compile time in scripts/

2023-02-15 Thread Alexandre Ghiti
From: Alexandre Ghiti 

Relocating kernel at runtime is done very early in the boot process, so
it is not convenient to check for relocations there and react in case a
relocation was not expected.

Powerpc architecture has a script that allows to check at compile time
for such unexpected relocations: extract the common logic to scripts/
so that other architectures can take advantage of it.

Signed-off-by: Alexandre Ghiti 
Reviewed-by: Anup Patel 
Acked-by: Michael Ellerman  (powerpc)
---
 arch/powerpc/tools/relocs_check.sh | 18 ++
 scripts/relocs_check.sh| 20 
 2 files changed, 22 insertions(+), 16 deletions(-)
 create mode 100755 scripts/relocs_check.sh

diff --git a/arch/powerpc/tools/relocs_check.sh 
b/arch/powerpc/tools/relocs_check.sh
index 63792af00417..6b350e75014c 100755
--- a/arch/powerpc/tools/relocs_check.sh
+++ b/arch/powerpc/tools/relocs_check.sh
@@ -15,21 +15,8 @@ if [ $# -lt 3 ]; then
exit 1
 fi
 
-# Have Kbuild supply the path to objdump and nm so we handle cross compilation.
-objdump="$1"
-nm="$2"
-vmlinux="$3"
-
-# Remove from the bad relocations those that match an undefined weak symbol
-# which will result in an absolute relocation to 0.
-# Weak unresolved symbols are of that form in nm output:
-# "  w _binary__btf_vmlinux_bin_end"
-undef_weak_symbols=$($nm "$vmlinux" | awk '$1 ~ /w/ { print $2 }')
-
 bad_relocs=$(
-$objdump -R "$vmlinux" |
-   # Only look at relocation lines.
-   grep -E '\

[PATCH v8 1/3] riscv: Introduce CONFIG_RELOCATABLE

2023-02-15 Thread Alexandre Ghiti
From: Alexandre Ghiti 

This config allows to compile 64b kernel as PIE and to relocate it at
any virtual address at runtime: this paves the way to KASLR.
Runtime relocation is possible since relocation metadata are embedded into
the kernel.

Note that relocating at runtime introduces an overhead even if the
kernel is loaded at the same address it was linked at and that the compiler
options are those used in arm64 which uses the same RELA relocation
format.

Signed-off-by: Alexandre Ghiti 
---
 arch/riscv/Kconfig  | 14 +
 arch/riscv/Makefile |  7 +++--
 arch/riscv/kernel/efi-header.S  |  6 ++--
 arch/riscv/kernel/vmlinux.lds.S | 10 --
 arch/riscv/mm/Makefile  |  4 +++
 arch/riscv/mm/init.c| 54 -
 6 files changed, 87 insertions(+), 8 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e2b656043abf..e0ee7ce4b2e3 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -544,6 +544,20 @@ config COMPAT
 
  If you want to execute 32-bit userspace applications, say Y.
 
+config RELOCATABLE
+   bool "Build a relocatable kernel"
+   depends on MMU && 64BIT && !XIP_KERNEL
+   help
+  This builds a kernel as a Position Independent Executable (PIE),
+  which retains all relocation metadata required to relocate the
+  kernel binary at runtime to a different virtual address than the
+  address it was linked at.
+  Since RISCV uses the RELA relocation format, this requires a
+  relocation pass at runtime even if the kernel is loaded at the
+  same address it was linked at.
+
+  If unsure, say N.
+
 endmenu # "Kernel features"
 
 menu "Boot options"
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 82153960ac00..97c34136b027 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -7,9 +7,12 @@
 #
 
 OBJCOPYFLAGS:= -O binary
-LDFLAGS_vmlinux :=
+ifeq ($(CONFIG_RELOCATABLE),y)
+   LDFLAGS_vmlinux += -shared -Bsymbolic -z notext -z norelro
+   KBUILD_CFLAGS += -fPIE
+endif
 ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
-   LDFLAGS_vmlinux := --no-relax
+   LDFLAGS_vmlinux += --no-relax
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
CC_FLAGS_FTRACE := -fpatchable-function-entry=8
 endif
diff --git a/arch/riscv/kernel/efi-header.S b/arch/riscv/kernel/efi-header.S
index 8e733aa48ba6..f7ee09c4f12d 100644
--- a/arch/riscv/kernel/efi-header.S
+++ b/arch/riscv/kernel/efi-header.S
@@ -33,7 +33,7 @@ optional_header:
.byte   0x02// MajorLinkerVersion
.byte   0x14// MinorLinkerVersion
.long   __pecoff_text_end - efi_header_end  // SizeOfCode
-   .long   __pecoff_data_virt_size // SizeOfInitializedData
+   .long   __pecoff_data_virt_end - __pecoff_text_end  // 
SizeOfInitializedData
.long   0   // 
SizeOfUninitializedData
.long   __efistub_efi_pe_entry - _start // AddressOfEntryPoint
.long   efi_header_end - _start // BaseOfCode
@@ -91,9 +91,9 @@ section_table:
IMAGE_SCN_MEM_EXECUTE   // Characteristics
 
.ascii  ".data\0\0\0"
-   .long   __pecoff_data_virt_size // VirtualSize
+   .long   __pecoff_data_virt_end - __pecoff_text_end  // VirtualSize
.long   __pecoff_text_end - _start  // VirtualAddress
-   .long   __pecoff_data_raw_size  // SizeOfRawData
+   .long   __pecoff_data_raw_end - __pecoff_text_end   // SizeOfRawData
.long   __pecoff_text_end - _start  // PointerToRawData
 
.long   0   // PointerToRelocations
diff --git a/arch/riscv/kernel/vmlinux.lds.S b/arch/riscv/kernel/vmlinux.lds.S
index 4e6c88aa4d87..8be2de3be08c 100644
--- a/arch/riscv/kernel/vmlinux.lds.S
+++ b/arch/riscv/kernel/vmlinux.lds.S
@@ -122,9 +122,15 @@ SECTIONS
*(.sdata*)
}
 
+   .rela.dyn : ALIGN(8) {
+   __rela_dyn_start = .;
+   *(.rela .rela*)
+   __rela_dyn_end = .;
+   }
+
 #ifdef CONFIG_EFI
.pecoff_edata_padding : { BYTE(0); . = ALIGN(PECOFF_FILE_ALIGNMENT); }
-   __pecoff_data_raw_size = ABSOLUTE(. - __pecoff_text_end);
+   __pecoff_data_raw_end = ABSOLUTE(.);
 #endif
 
/* End of data section */
@@ -134,7 +140,7 @@ SECTIONS
 
 #ifdef CONFIG_EFI
. = ALIGN(PECOFF_SECTION_ALIGNMENT);
-   __pecoff_data_virt_size = ABSOLUTE(. - __pecoff_text_end);
+   __pecoff_data_virt_end = ABSOLUTE(.);
 #endif
_end = .;
 
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index 2ac177c05352..b85e9e82f082 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -1,6 +1,10 @@
 # SPDX-License-Identifier: 

[PATCH v8 0/3] Introduce 64b relocatable kernel

2023-02-15 Thread Alexandre Ghiti
After multiple attempts, this patchset is now based on the fact that the
64b kernel mapping was moved outside the linear mapping.

The first patch allows to build relocatable kernels but is not selected
by default. That patch is a requirement for KASLR.
The second and third patches take advantage of an already existing powerpc
script that checks relocations at compile-time, and uses it for riscv.

This patchset is rebased on top of:

RISC-V kasan rework (https://lore.kernel.org/lkml/Y6TTvku%2FyuSjm42j@spud/T/)
riscv: Use PUD/P4D/PGD pages for the linear mapping 
(https://lore.kernel.org/lkml/20230125114229.hrhsyw4aegrnmoau@orel/T/)
riscv: Allow to downgrade paging mode from the command line 
(https://lore.kernel.org/lkml/CAHVXubjeSMvfTPnvrnYRupOGx6+vUvUGfRS3piTeo=th2ch...@mail.gmail.com/)
base-commit-tag: v6.2-rc7

Changes in v8:
  * Fix UEFI boot by moving rela.dyn section into the data so that PE/COFF
loader actually copies the relocations too
  * Fix check that used PGDIR instead of PUD which was not correct
for sv48 and sv57
  * Fix PE/COFF header data size definition as it led to size of 0

Changes in v7:
  * Rebase on top of v5.15
  * Fix LDFLAGS_vmlinux which was overriden when CONFIG_DYNAMIC_FTRACE was
set
  * Make relocate_kernel static
  * Add Ack from Michael

Changes in v6:
  * Remove the kernel move to vmalloc zone
  * Rebased on top of for-next
  * Remove relocatable property from 32b kernel as the kernel is mapped in
the linear mapping and would then need to be copied physically too
  * CONFIG_RELOCATABLE depends on !XIP_KERNEL
  * Remove Reviewed-by from first patch as it changed a bit

Changes in v5:
  * Add "static __init" to create_kernel_page_table function as reported by
Kbuild test robot
  * Add reviewed-by from Zong
  * Rebase onto v5.7

Changes in v4:
  * Fix BPF region that overlapped with kernel's as suggested by Zong
  * Fix end of module region that could be larger than 2GB as suggested by Zong
  * Fix the size of the vm area reserved for the kernel as we could lose
PMD_SIZE if the size was already aligned on PMD_SIZE
  * Split compile time relocations check patch into 2 patches as suggested by 
Anup
  * Applied Reviewed-by from Zong and Anup

Changes in v3:
  * Move kernel mapping to vmalloc

Changes in v2:
  * Make RELOCATABLE depend on MMU as suggested by Anup
  * Rename kernel_load_addr into kernel_virt_addr as suggested by Anup
  * Use __pa_symbol instead of __pa, as suggested by Zong
  * Rebased on top of v5.6-rc3
  * Tested with sv48 patchset
  * Add Reviewed/Tested-by from Zong and Anup

Alexandre Ghiti (3):
  riscv: Introduce CONFIG_RELOCATABLE
  powerpc: Move script to check relocations at compile time in scripts/
  riscv: Check relocations at compile time

 arch/powerpc/tools/relocs_check.sh | 18 ++
 arch/riscv/Kconfig | 14 
 arch/riscv/Makefile|  7 ++--
 arch/riscv/Makefile.postlink   | 36 
 arch/riscv/kernel/efi-header.S |  6 ++--
 arch/riscv/kernel/vmlinux.lds.S| 10 --
 arch/riscv/mm/Makefile |  4 +++
 arch/riscv/mm/init.c   | 54 +-
 arch/riscv/tools/relocs_check.sh   | 26 ++
 scripts/relocs_check.sh| 20 +++
 10 files changed, 171 insertions(+), 24 deletions(-)
 create mode 100644 arch/riscv/Makefile.postlink
 create mode 100755 arch/riscv/tools/relocs_check.sh
 create mode 100755 scripts/relocs_check.sh

--
2.37.2



[PATCH v1 4/9] powerpc/gamecube|wii : Use machine_device_initcall()

2023-02-15 Thread Christophe Leroy
Instead of checking machine type in the function,
use machine_device_initcall().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/embedded6xx/gamecube.c | 5 +
 arch/powerpc/platforms/embedded6xx/wii.c  | 5 +
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/gamecube.c 
b/arch/powerpc/platforms/embedded6xx/gamecube.c
index 4fc84ff95b5e..60cdc2852c7a 100644
--- a/arch/powerpc/platforms/embedded6xx/gamecube.c
+++ b/arch/powerpc/platforms/embedded6xx/gamecube.c
@@ -83,11 +83,8 @@ static const struct of_device_id gamecube_of_bus[] = {
 
 static int __init gamecube_device_probe(void)
 {
-   if (!machine_is(gamecube))
-   return 0;
-
of_platform_bus_probe(NULL, gamecube_of_bus, NULL);
return 0;
 }
-device_initcall(gamecube_device_probe);
+machine_device_initcall(gamecube, gamecube_device_probe);
 
diff --git a/arch/powerpc/platforms/embedded6xx/wii.c 
b/arch/powerpc/platforms/embedded6xx/wii.c
index f2cc00e6f12f..635c393d307a 100644
--- a/arch/powerpc/platforms/embedded6xx/wii.c
+++ b/arch/powerpc/platforms/embedded6xx/wii.c
@@ -161,13 +161,10 @@ static const struct of_device_id wii_of_bus[] = {
 
 static int __init wii_device_probe(void)
 {
-   if (!machine_is(wii))
-   return 0;
-
of_platform_populate(NULL, wii_of_bus, NULL, NULL);
return 0;
 }
-device_initcall(wii_device_probe);
+machine_device_initcall(wii, wii_device_probe);
 
 define_machine(wii) {
.name   = "wii",
-- 
2.39.1



[PATCH v1 9/9] powerpc/85xx: Don't check ppc_md.progress in mpc85xx_cds_setup_arch()

2023-02-15 Thread Christophe Leroy
mpc85xx_cds_setup_arch() is not a hot path, creating the string to be
printed even when it doesn't get printed at the end is not an problem.

Do it at all time.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/85xx/mpc85xx_cds.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/mpc85xx_cds.c 
b/arch/powerpc/platforms/85xx/mpc85xx_cds.c
index 41079d02dee8..dd969311b78e 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_cds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_cds.c
@@ -318,6 +318,7 @@ static void __init mpc85xx_cds_setup_arch(void)
 {
struct device_node *np;
int cds_pci_slot;
+   char buf[40];
 
ppc_md_progress("mpc85xx_cds_setup_arch()", 0);
 
@@ -334,13 +335,9 @@ static void __init mpc85xx_cds_setup_arch(void)
return;
}
 
-   if (ppc_md.progress) {
-   char buf[40];
-   cds_pci_slot = ((in_8(>cm_csr) >> 6) & 0x3) + 1;
-   snprintf(buf, 40, "CDS Version = 0x%x in slot %d\n",
-   in_8(>cm_ver), cds_pci_slot);
-   ppc_md_progress(buf, 0);
-   }
+   cds_pci_slot = ((in_8(>cm_csr) >> 6) & 0x3) + 1;
+   snprintf(buf, 40, "CDS Version = 0x%x in slot %d\n", 
in_8(>cm_ver), cds_pci_slot);
+   ppc_md_progress(buf, 0);
 
 #ifdef CONFIG_PCI
ppc_md.pci_irq_fixup = mpc85xx_cds_pci_irq_fixup;
-- 
2.39.1



[PATCH v1 7/9] powerpc: Add ppc_md_progress()

2023-02-15 Thread Christophe Leroy
Many places have:

if (ppc.md_progress)
ppc.md_progress();

Introduce ppc_md_progress() which will embedded the test.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/machdep.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index c5dfe5ff923c..77e126e9cabc 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -234,6 +234,12 @@ static inline void log_error(char *buf, unsigned int 
err_type, int fatal)
ppc_md.log_error(buf, err_type, fatal);
 }
 
+static inline void ppc_md_progress(char *s, unsigned short hex)
+{
+   if (ppc_md.progress)
+   ppc_md.progress(s, hex);
+}
+
 #define __define_machine_initcall(mach, fn, id) \
static int __init __machine_initcall_##mach##_##fn(void) { \
if (machine_is(mach)) return fn(); \
-- 
2.39.1



[PATCH v1 3/9] powerpc/47x: Split ppc47x machine in two

2023-02-15 Thread Christophe Leroy
This machine matches two compatibles and sets .pci_irq_fixup
on one of them.

Split it into two machines, then the probe function can be dropped.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/44x/ppc476.c | 31 +
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/44x/ppc476.c 
b/arch/powerpc/platforms/44x/ppc476.c
index 7c91ac5a5241..f0b411cc7bb7 100644
--- a/arch/powerpc/platforms/44x/ppc476.c
+++ b/arch/powerpc/platforms/44x/ppc476.c
@@ -268,27 +268,24 @@ static void ppc47x_pci_irq_fixup(struct pci_dev *dev)
}
 }
 
-/*
- * Called very early, MMU is off, device-tree isn't unflattened
- */
-static int __init ppc47x_probe(void)
-{
-   if (of_machine_is_compatible("ibm,akebono"))
-   return 1;
-
-   if (of_machine_is_compatible("ibm,currituck")) {
-   ppc_md.pci_irq_fixup = ppc47x_pci_irq_fixup;
-   return 1;
-   }
-
-   return 0;
-}
+define_machine(ppc47x_akebono) {
+   .name   = "PowerPC 47x (akebono)",
+   .compatible = "ibm,akebono",
+   .probe  = ppc47x_probe,
+   .progress   = udbg_progress,
+   .init_IRQ   = ppc47x_init_irq,
+   .setup_arch = ppc47x_setup_arch,
+   .restart= ppc4xx_reset_system,
+   .calibrate_decr = generic_calibrate_decr,
+};
 
-define_machine(ppc47x) {
-   .name   = "PowerPC 47x",
+define_machine(ppc47x_currituck) {
+   .name   = "PowerPC 47x (currituck)",
+   .compatible = "ibm,currituck",
.probe  = ppc47x_probe,
.progress   = udbg_progress,
.init_IRQ   = ppc47x_init_irq,
+   .pci_irq_fixup  = ppc47x_pci_irq_fixup,
.setup_arch = ppc47x_setup_arch,
.restart= ppc4xx_reset_system,
.calibrate_decr = generic_calibrate_decr,
-- 
2.39.1



[PATCH v1 8/9] powerpc: Use ppc_md_progress()

2023-02-15 Thread Christophe Leroy
Many places have:

if (ppc.md_progress)
ppc.md_progress();

Use ppc_md_progress() instead.

Note that checkpatch complains about using function names,
but this is not a function taking format strings, so we
leave the function names for now.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/setup_32.c |  3 +--
 arch/powerpc/mm/book3s32/mmu.c | 10 --
 arch/powerpc/mm/init_32.c  | 12 
 arch/powerpc/platforms/52xx/efika.c|  3 +--
 arch/powerpc/platforms/52xx/lite5200.c |  3 +--
 arch/powerpc/platforms/52xx/media5200.c|  3 +--
 arch/powerpc/platforms/52xx/mpc5200_simple.c   |  3 +--
 arch/powerpc/platforms/82xx/ep8248e.c  |  6 ++
 arch/powerpc/platforms/82xx/km82xx.c   |  6 ++
 arch/powerpc/platforms/82xx/mpc8272_ads.c  |  6 ++
 arch/powerpc/platforms/82xx/pq2fads.c  |  6 ++
 arch/powerpc/platforms/83xx/misc.c |  3 +--
 arch/powerpc/platforms/85xx/bsc913x_qds.c  |  3 +--
 arch/powerpc/platforms/85xx/bsc913x_rdb.c  |  3 +--
 arch/powerpc/platforms/85xx/c293pcie.c |  3 +--
 arch/powerpc/platforms/85xx/ge_imp3a.c |  3 +--
 arch/powerpc/platforms/85xx/ksi8560.c  |  3 +--
 arch/powerpc/platforms/85xx/mpc8536_ds.c   |  3 +--
 arch/powerpc/platforms/85xx/mpc85xx_ads.c  |  3 +--
 arch/powerpc/platforms/85xx/mpc85xx_cds.c  |  5 ++---
 arch/powerpc/platforms/85xx/mpc85xx_ds.c   |  3 +--
 arch/powerpc/platforms/85xx/mpc85xx_mds.c  |  3 +--
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c  |  3 +--
 arch/powerpc/platforms/85xx/mvme2500.c |  3 +--
 arch/powerpc/platforms/85xx/p1010rdb.c |  3 +--
 arch/powerpc/platforms/85xx/p1022_ds.c |  3 +--
 arch/powerpc/platforms/85xx/p1022_rdk.c|  3 +--
 arch/powerpc/platforms/85xx/p1023_rdb.c|  3 +--
 arch/powerpc/platforms/85xx/ppa8548.c  |  3 +--
 arch/powerpc/platforms/85xx/qemu_e500.c|  2 +-
 arch/powerpc/platforms/85xx/socrates.c |  3 +--
 arch/powerpc/platforms/85xx/stx_gp3.c  |  3 +--
 arch/powerpc/platforms/85xx/tqm85xx.c  |  3 +--
 arch/powerpc/platforms/85xx/twr_p102x.c|  3 +--
 arch/powerpc/platforms/86xx/mpc8610_hpcd.c |  3 +--
 arch/powerpc/platforms/86xx/mpc86xx_hpcn.c |  3 +--
 arch/powerpc/platforms/86xx/mvme7100.c |  3 +--
 arch/powerpc/platforms/amigaone/setup.c|  3 +--
 arch/powerpc/platforms/chrp/setup.c|  5 ++---
 arch/powerpc/platforms/embedded6xx/holly.c |  6 ++
 .../platforms/embedded6xx/mpc7448_hpc2.c   |  7 +++
 arch/powerpc/platforms/embedded6xx/mvme5100.c  |  3 +--
 arch/powerpc/platforms/powermac/smp.c  | 18 --
 arch/powerpc/platforms/pseries/setup.c |  8 
 44 files changed, 69 insertions(+), 121 deletions(-)

diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index b761cc1a403c..843f64050efc 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -127,8 +127,7 @@ __setup("l3cr=", ppc_setup_l3cr);
 static int __init ppc_init(void)
 {
/* clear the progress line */
-   if (ppc_md.progress)
-   ppc_md.progress(" ", 0x);
+   ppc_md_progress(" ", 0x);
 
/* call platform init */
if (ppc_md.init != NULL) {
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 850783cfa9c7..ec6facff2779 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -347,7 +347,7 @@ void __init MMU_init_hw(void)
if (!mmu_has_feature(MMU_FTR_HPTE_TABLE))
return;
 
-   if ( ppc_md.progress ) ppc_md.progress("hash:enter", 0x105);
+   ppc_md_progress("hash:enter", 0x105);
 
 #define LG_HPTEG_SIZE  6   /* 64 bytes per HPTEG */
 #define SDR1_LOW_BITS  ((n_hpteg - 1) >> 10)
@@ -371,7 +371,7 @@ void __init MMU_init_hw(void)
/*
 * Find some memory for the hash table.
 */
-   if ( ppc_md.progress ) ppc_md.progress("hash:find piece", 0x322);
+   ppc_md_progress("hash:find piece", 0x322);
Hash = memblock_alloc(Hash_size, Hash_size);
if (!Hash)
panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
@@ -396,10 +396,8 @@ void __init MMU_init_hw_patch(void)
if (!mmu_has_feature(MMU_FTR_HPTE_TABLE))
return;
 
-   if (ppc_md.progress)
-   ppc_md.progress("hash:patch", 0x345);
-   if (ppc_md.progress)
-   ppc_md.progress("hash:done", 0x205);
+   ppc_md_progress("hash:patch", 0x345);
+   ppc_md_progress("hash:done", 0x205);
 
/* WARNING: Make sure nothing can trigger a KASAN check past this point 
*/
 
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index d4cc3749e621..97e0f58dd401 100644
--- 

[PATCH v1 6/9] powerpc: Make generic_calibrate_decr() the default

2023-02-15 Thread Christophe Leroy
ppc_md.calibrate_decr() is a mandatory item. Its nullity is never
checked so it must be non null on all platforms.

Most platforms define generic_calibrate_decr() as their
ppc_md.calibrate_decr(). Have time_init() call
generic_calibrate_decr() when ppc_md.calibrate_decr() is NULL,
and remove default assignment from all machines.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/time.c|  6 +-
 arch/powerpc/platforms/40x/ppc40x_simple.c|  1 -
 arch/powerpc/platforms/44x/canyonlands.c  |  1 -
 arch/powerpc/platforms/44x/ebony.c|  1 -
 arch/powerpc/platforms/44x/fsp2.c |  1 -
 arch/powerpc/platforms/44x/iss4xx.c   |  1 -
 arch/powerpc/platforms/44x/ppc44x_simple.c|  1 -
 arch/powerpc/platforms/44x/ppc476.c   |  2 --
 arch/powerpc/platforms/44x/sam440ep.c |  1 -
 arch/powerpc/platforms/44x/warp.c |  1 -
 arch/powerpc/platforms/512x/mpc5121_ads.c |  1 -
 arch/powerpc/platforms/512x/mpc512x_generic.c |  1 -
 arch/powerpc/platforms/512x/pdm360ng.c|  1 -
 arch/powerpc/platforms/52xx/efika.c   |  1 -
 arch/powerpc/platforms/52xx/lite5200.c|  1 -
 arch/powerpc/platforms/52xx/media5200.c   |  1 -
 arch/powerpc/platforms/52xx/mpc5200_simple.c  |  1 -
 arch/powerpc/platforms/82xx/ep8248e.c |  1 -
 arch/powerpc/platforms/82xx/km82xx.c  |  1 -
 arch/powerpc/platforms/82xx/mpc8272_ads.c |  1 -
 arch/powerpc/platforms/82xx/pq2fads.c |  1 -
 arch/powerpc/platforms/83xx/asp834x.c |  1 -
 arch/powerpc/platforms/83xx/km83xx.c  |  1 -
 arch/powerpc/platforms/83xx/mpc830x_rdb.c |  1 -
 arch/powerpc/platforms/83xx/mpc831x_rdb.c |  1 -
 arch/powerpc/platforms/83xx/mpc832x_mds.c |  1 -
 arch/powerpc/platforms/83xx/mpc832x_rdb.c |  1 -
 arch/powerpc/platforms/83xx/mpc834x_itx.c |  1 -
 arch/powerpc/platforms/83xx/mpc834x_mds.c |  1 -
 arch/powerpc/platforms/83xx/mpc836x_mds.c |  1 -
 arch/powerpc/platforms/83xx/mpc836x_rdk.c |  1 -
 arch/powerpc/platforms/83xx/mpc837x_mds.c |  1 -
 arch/powerpc/platforms/83xx/mpc837x_rdb.c |  1 -
 arch/powerpc/platforms/85xx/bsc913x_qds.c |  1 -
 arch/powerpc/platforms/85xx/bsc913x_rdb.c |  1 -
 arch/powerpc/platforms/85xx/c293pcie.c|  1 -
 arch/powerpc/platforms/85xx/corenet_generic.c |  1 -
 arch/powerpc/platforms/85xx/ge_imp3a.c|  1 -
 arch/powerpc/platforms/85xx/ksi8560.c |  1 -
 arch/powerpc/platforms/85xx/mpc8536_ds.c  |  1 -
 arch/powerpc/platforms/85xx/mpc85xx_ads.c |  1 -
 arch/powerpc/platforms/85xx/mpc85xx_cds.c |  1 -
 arch/powerpc/platforms/85xx/mpc85xx_ds.c  |  3 ---
 arch/powerpc/platforms/85xx/mpc85xx_mds.c |  3 ---
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 10 --
 arch/powerpc/platforms/85xx/mvme2500.c|  1 -
 arch/powerpc/platforms/85xx/p1010rdb.c|  1 -
 arch/powerpc/platforms/85xx/p1022_ds.c|  1 -
 arch/powerpc/platforms/85xx/p1022_rdk.c   |  1 -
 arch/powerpc/platforms/85xx/p1023_rdb.c   |  1 -
 arch/powerpc/platforms/85xx/ppa8548.c |  1 -
 arch/powerpc/platforms/85xx/qemu_e500.c   |  1 -
 arch/powerpc/platforms/85xx/socrates.c|  1 -
 arch/powerpc/platforms/85xx/stx_gp3.c |  1 -
 arch/powerpc/platforms/85xx/tqm85xx.c |  1 -
 arch/powerpc/platforms/85xx/twr_p102x.c   |  1 -
 arch/powerpc/platforms/85xx/xes_mpc85xx.c |  3 ---
 arch/powerpc/platforms/86xx/gef_ppc9a.c   |  1 -
 arch/powerpc/platforms/86xx/gef_sbc310.c  |  1 -
 arch/powerpc/platforms/86xx/gef_sbc610.c  |  1 -
 arch/powerpc/platforms/86xx/mpc8610_hpcd.c|  1 -
 arch/powerpc/platforms/86xx/mpc86xx_hpcn.c|  1 -
 arch/powerpc/platforms/86xx/mvme7100.c|  1 -
 arch/powerpc/platforms/8xx/adder875.c |  1 -
 arch/powerpc/platforms/amigaone/setup.c   |  1 -
 arch/powerpc/platforms/cell/setup.c   |  1 -
 arch/powerpc/platforms/chrp/setup.c   |  1 -
 arch/powerpc/platforms/embedded6xx/gamecube.c |  1 -
 arch/powerpc/platforms/embedded6xx/holly.c|  1 -
 arch/powerpc/platforms/embedded6xx/linkstation.c  |  1 -
 arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c |  1 -
 arch/powerpc/platforms/embedded6xx/mvme5100.c |  1 -
 arch/powerpc/platforms/embedded6xx/storcenter.c   |  1 -
 arch/powerpc/platforms/embedded6xx/wii.c  |  1 -
 arch/powerpc/platforms/maple/setup.c  |  1 -
 arch/powerpc/platforms/microwatt/setup.c  |  1 -
 arch/powerpc/platforms/pasemi/setup.c |  1 -
 arch/powerpc/platforms/powernv/setup.c|  1 -
 arch/powerpc/platforms/pseries/setup.c|  1 -
 79 files changed, 5 

[PATCH v1 1/9] powerpc/machdep: Define 'compatible' property in ppc_md and use it

2023-02-15 Thread Christophe Leroy
Most probe functions do nothing else than checking whether
the machine is compatible to a given string.

Define that string in ppc_md structure and check it directly from
probe_machine() instead of using ppc_md.probe() for that.

Keep checking in ppc_md.probe() only for more complex probing.

Signed-off-by: Christophe Leroy 
---
v3: New
---
 arch/powerpc/include/asm/machdep.h |  1 +
 arch/powerpc/kernel/setup-common.c | 13 +++--
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 378b8d5836a7..c5dfe5ff923c 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -20,6 +20,7 @@ struct pci_host_bridge;
 
 struct machdep_calls {
char*name;
+   char*compatible;
 #ifdef CONFIG_PPC64
 #ifdef CONFIG_PM
void(*iommu_restore)(void);
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 9b10e57040c6..d1e205fe72ba 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -626,13 +626,14 @@ static __init void probe_machine(void)
for (machine_id = &__machine_desc_start;
 machine_id < &__machine_desc_end;
 machine_id++) {
-   DBG("  %s ...", machine_id->name);
+   DBG("  %s ...\n", machine_id->name);
+   if (machine_id->compatible && 
!of_machine_is_compatible(machine_id->compatible))
+   continue;
memcpy(_md, machine_id, sizeof(struct machdep_calls));
-   if (ppc_md.probe()) {
-   DBG(" match !\n");
-   break;
-   }
-   DBG("\n");
+   if (ppc_md.probe && !ppc_md.probe())
+   continue;
+   DBG("   %s match !\n", machine_id->name);
+   break;
}
/* What can we do if we didn't find ? */
if (machine_id >= &__machine_desc_end) {
-- 
2.39.1



[PATCH v1 5/9] powerpc/85xx: Fix function naming for p1023_rdb platform

2023-02-15 Thread Christophe Leroy
p1023_rdb platform is a copy of mpc85xx_rdb platform and some of its
functions have kept mpc85xx_rdb names.

Rename the said functions.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/85xx/p1023_rdb.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/p1023_rdb.c 
b/arch/powerpc/platforms/85xx/p1023_rdb.c
index 37e78f40d424..e199fc44fc2f 100644
--- a/arch/powerpc/platforms/85xx/p1023_rdb.c
+++ b/arch/powerpc/platforms/85xx/p1023_rdb.c
@@ -37,7 +37,7 @@
  * Setup the architecture
  *
  */
-static void __init mpc85xx_rdb_setup_arch(void)
+static void __init p1023_rdb_setup_arch(void)
 {
struct device_node *np;
 
@@ -83,7 +83,7 @@ static void __init mpc85xx_rdb_setup_arch(void)
 
 machine_arch_initcall(p1023_rdb, mpc85xx_common_publish_devices);
 
-static void __init mpc85xx_rdb_pic_init(void)
+static void __init p1023_rdb_pic_init(void)
 {
struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN |
MPIC_SINGLE_DEST_CPU,
@@ -97,8 +97,8 @@ static void __init mpc85xx_rdb_pic_init(void)
 define_machine(p1023_rdb) {
.name   = "P1023 RDB",
.compatible = "fsl,P1023RDB",
-   .setup_arch = mpc85xx_rdb_setup_arch,
-   .init_IRQ   = mpc85xx_rdb_pic_init,
+   .setup_arch = p1023_rdb_setup_arch,
+   .init_IRQ   = p1023_rdb_pic_init,
.get_irq= mpic_get_irq,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
-- 
2.39.1



[PATCH v1 2/9] powerpc/platforms: Use 'compatible' property for simple cases

2023-02-15 Thread Christophe Leroy
Use the new 'compatible' property for simple cases.

checkpatch complains about the new compatible being undocumented
but in reality nothing is new so just ignore it for the time
being.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/44x/canyonlands.c  |  9 +--
 arch/powerpc/platforms/44x/ebony.c|  4 +-
 arch/powerpc/platforms/44x/iss4xx.c   | 13 +--
 arch/powerpc/platforms/44x/sam440ep.c |  4 +-
 arch/powerpc/platforms/44x/warp.c | 10 +--
 arch/powerpc/platforms/512x/mpc5121_ads.c |  4 +-
 arch/powerpc/platforms/512x/pdm360ng.c|  4 +-
 arch/powerpc/platforms/52xx/media5200.c   | 16 +---
 arch/powerpc/platforms/82xx/ep8248e.c | 10 +--
 arch/powerpc/platforms/82xx/km82xx.c  | 10 +--
 arch/powerpc/platforms/82xx/mpc8272_ads.c | 10 +--
 arch/powerpc/platforms/82xx/pq2fads.c | 10 +--
 arch/powerpc/platforms/83xx/asp834x.c | 10 +--
 arch/powerpc/platforms/83xx/mpc832x_mds.c | 10 +--
 arch/powerpc/platforms/83xx/mpc832x_rdb.c | 10 +--
 arch/powerpc/platforms/83xx/mpc834x_itx.c | 10 +--
 arch/powerpc/platforms/83xx/mpc834x_mds.c | 10 +--
 arch/powerpc/platforms/83xx/mpc836x_mds.c | 10 +--
 arch/powerpc/platforms/83xx/mpc836x_rdk.c | 10 +--
 arch/powerpc/platforms/83xx/mpc837x_mds.c | 10 +--
 arch/powerpc/platforms/85xx/bsc913x_qds.c | 11 +--
 arch/powerpc/platforms/85xx/bsc913x_rdb.c | 11 +--
 arch/powerpc/platforms/85xx/c293pcie.c| 12 +--
 arch/powerpc/platforms/85xx/ge_imp3a.c| 10 +--
 arch/powerpc/platforms/85xx/ksi8560.c | 10 +--
 arch/powerpc/platforms/85xx/mpc8536_ds.c  | 10 +--
 arch/powerpc/platforms/85xx/mpc85xx_ads.c | 10 +--
 arch/powerpc/platforms/85xx/mpc85xx_cds.c | 11 +--
 arch/powerpc/platforms/85xx/mpc85xx_ds.c  | 30 +--
 arch/powerpc/platforms/85xx/mpc85xx_mds.c | 22 +
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 81 +++
 arch/powerpc/platforms/85xx/mvme2500.c| 10 +--
 arch/powerpc/platforms/85xx/p1022_ds.c| 10 +--
 arch/powerpc/platforms/85xx/p1022_rdk.c   | 10 +--
 arch/powerpc/platforms/85xx/p1023_rdb.c   |  8 +-
 arch/powerpc/platforms/85xx/ppa8548.c | 10 +--
 arch/powerpc/platforms/85xx/qemu_e500.c   | 10 +--
 arch/powerpc/platforms/85xx/socrates.c| 13 +--
 arch/powerpc/platforms/85xx/stx_gp3.c | 10 +--
 arch/powerpc/platforms/85xx/twr_p102x.c   |  7 +-
 arch/powerpc/platforms/85xx/xes_mpc85xx.c | 24 +-
 arch/powerpc/platforms/86xx/gef_ppc9a.c   | 18 +
 arch/powerpc/platforms/86xx/gef_sbc310.c  | 18 +
 arch/powerpc/platforms/86xx/gef_sbc610.c  | 18 +
 arch/powerpc/platforms/86xx/mpc8610_hpcd.c| 13 +--
 arch/powerpc/platforms/86xx/mpc86xx_hpcn.c| 14 +---
 arch/powerpc/platforms/8xx/adder875.c |  7 +-
 arch/powerpc/platforms/8xx/ep88xc.c   |  7 +-
 arch/powerpc/platforms/8xx/mpc86xads_setup.c  |  7 +-
 arch/powerpc/platforms/8xx/mpc885ads_setup.c  |  7 +-
 arch/powerpc/platforms/8xx/tqm8xx_setup.c |  7 +-
 arch/powerpc/platforms/amigaone/setup.c   | 21 +++--
 arch/powerpc/platforms/embedded6xx/gamecube.c |  4 +-
 arch/powerpc/platforms/embedded6xx/holly.c| 12 +--
 .../platforms/embedded6xx/linkstation.c   |  4 +-
 .../platforms/embedded6xx/mpc7448_hpc2.c  | 12 +--
 arch/powerpc/platforms/embedded6xx/mvme5100.c | 10 +--
 .../platforms/embedded6xx/storcenter.c|  7 +-
 arch/powerpc/platforms/embedded6xx/wii.c  |  4 +-
 arch/powerpc/platforms/microwatt/setup.c  |  7 +-
 arch/powerpc/platforms/powernv/setup.c|  4 +-
 arch/powerpc/platforms/ps3/setup.c|  4 +-
 62 files changed, 88 insertions(+), 631 deletions(-)

diff --git a/arch/powerpc/platforms/44x/canyonlands.c 
b/arch/powerpc/platforms/44x/canyonlands.c
index 5b23aef8bdef..ba561ca6c25f 100644
--- a/arch/powerpc/platforms/44x/canyonlands.c
+++ b/arch/powerpc/platforms/44x/canyonlands.c
@@ -39,11 +39,9 @@ machine_device_initcall(canyonlands, ppc460ex_device_probe);
 
 static int __init ppc460ex_probe(void)
 {
-   if (of_machine_is_compatible("amcc,canyonlands")) {
-   pci_set_flags(PCI_REASSIGN_ALL_RSRC);
-   return 1;
-   }
-   return 0;
+   pci_set_flags(PCI_REASSIGN_ALL_RSRC);
+
+   return 1;
 }
 
 /* USB PHY fixup code on Canyonlands kit. */
@@ -110,6 +108,7 @@ static int __init ppc460ex_canyonlands_fixup(void)
 machine_device_initcall(canyonlands, ppc460ex_canyonlands_fixup);
 define_machine(canyonlands) {
.name = "Canyonlands",
+   .compatible = "amcc,canyonlands",
.probe = ppc460ex_probe,
.progress = udbg_progress,
.init_IRQ = uic_init_tree,
diff --git a/arch/powerpc/platforms/44x/ebony.c 
b/arch/powerpc/platforms/44x/ebony.c
index 0d8f202bc45f..5b9e57b4cd65 100644
--- a/arch/powerpc/platforms/44x/ebony.c
+++ b/arch/powerpc/platforms/44x/ebony.c
@@ -45,9 +45,6 @@ 

Re: [PATCH v2 04/24] arm64/cpu: Mark cpu_die() __noreturn

2023-02-15 Thread Mark Rutland
On Tue, Feb 14, 2023 at 09:13:08AM +0100, Philippe Mathieu-Daudé wrote:
> On 14/2/23 08:05, Josh Poimboeuf wrote:
> > cpu_die() doesn't return.  Annotate it as such.  By extension this also
> > makes arch_cpu_idle_dead() noreturn.
> > 
> > Signed-off-by: Josh Poimboeuf 
> > ---
> >   arch/arm64/include/asm/smp.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> > index fc55f5a57a06..5733a31bab08 100644
> > --- a/arch/arm64/include/asm/smp.h
> > +++ b/arch/arm64/include/asm/smp.h
> > @@ -100,7 +100,7 @@ static inline void arch_send_wakeup_ipi_mask(const 
> > struct cpumask *mask)
> >   extern int __cpu_disable(void);
> >   extern void __cpu_die(unsigned int cpu);
> > -extern void cpu_die(void);
> > +extern void __noreturn cpu_die(void);
> >   extern void cpu_die_early(void);
> 
> Shouldn't cpu_operations::cpu_die() be declared noreturn first?

The cpu_die() function ends with a BUG(), and so does not return, even if a
cpu_operations::cpu_die() function that it calls erroneously returned.

We *could* mark cpu_operations::cpu_die() as noreturn, but I'd prefer that we
did not so that the compiler doesn't optimize away the BUG() which is there to
catch such erroneous returns.

That said, could we please add __noreturn to the implementation of cpu_die() in
arch/arm64/kernel/smp.c? i.e. the fixup below.

With that fixup:

Acked-by: Mark Rutland 

Mark.

>8
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76cf695..a98a76f7c1c6 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -361,7 +361,7 @@ void __cpu_die(unsigned int cpu)
  * Called from the idle thread for the CPU which has been shutdown.
  *
  */
-void cpu_die(void)
+void __noreturn cpu_die(void)
 {
unsigned int cpu = smp_processor_id();
const struct cpu_operations *ops = get_cpu_ops(cpu);


Re: [PATCH v3 03/24] arm: Remove COMMAND_LINE_SIZE from uapi

2023-02-15 Thread Arnd Bergmann
On Wed, Feb 15, 2023, at 13:59, Russell King (Oracle) wrote:
> On Tue, Feb 14, 2023 at 08:49:04AM +0100, Alexandre Ghiti wrote:
>> From: Palmer Dabbelt 
>> 
>> As far as I can tell this is not used by userspace and thus should not
>> be part of the user-visible API.
>> 
>> Signed-off-by: Palmer Dabbelt 
>
> Looks good to me. What's the merge plan for this?

The easiest way is probably if I merge it through the whole
series through the asm-generic tree. The timing is a bit
unfortunate as we're just ahead of the merge window, so unless
we really need this in 6.3, I'd suggest that Alexandre resend
the series to me in two weeks with the Acks added in and I'll
pick it up for 6.4.

 Arnd


Re: [PATCH v3 03/24] arm: Remove COMMAND_LINE_SIZE from uapi

2023-02-15 Thread Russell King (Oracle)
On Tue, Feb 14, 2023 at 08:49:04AM +0100, Alexandre Ghiti wrote:
> From: Palmer Dabbelt 
> 
> As far as I can tell this is not used by userspace and thus should not
> be part of the user-visible API.
> 
> Signed-off-by: Palmer Dabbelt 

Looks good to me. What's the merge plan for this?

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH v4 06/10] soc: fsl: cmp1: Add support for QMC

2023-02-15 Thread Christophe Leroy


Le 15/02/2023 à 13:45, kernel test robot a écrit :
> Hi Herve,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on broonie-sound/for-next]
> [also build test ERROR on robh/for-next powerpc/next powerpc/fixes 
> linus/master v6.2-rc8 next-20230215]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:
> https://github.com/intel-lab-lkp/linux/commits/Herve-Codina/dt-bindings-soc-fsl-cpm_qe-Add-TSA-controller/20230128-152424
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 
> for-next
> patch link:
> https://lore.kernel.org/r/20230126083222.374243-7-herve.codina%40bootlin.com
> patch subject: [PATCH v4 06/10] soc: fsl: cmp1: Add support for QMC
> config: powerpc-allyesconfig 
> (https://download.01.org/0day-ci/archive/20230215/202302152037.nxhi2afy-...@intel.com/config)
> compiler: powerpc-linux-gcc (GCC) 12.1.0
> reproduce (this is a W=1 build):
>  wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
>  chmod +x ~/bin/make.cross
>  # 
> https://github.com/intel-lab-lkp/linux/commit/20ec2eacb76ca7252aa2934f53357663652edd0f
>  git remote add linux-review https://github.com/intel-lab-lkp/linux
>  git fetch --no-tags linux-review 
> Herve-Codina/dt-bindings-soc-fsl-cpm_qe-Add-TSA-controller/20230128-152424
>  git checkout 20ec2eacb76ca7252aa2934f53357663652edd0f
>  # save the config file
>  mkdir build_dir && cp config build_dir/.config
>  COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 
> O=build_dir ARCH=powerpc olddefconfig
>  COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 
> O=build_dir ARCH=powerpc SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot 
> | Link: 
> https://lore.kernel.org/oe-kbuild-all/202302152037.nxhi2afy-...@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
> powerpc-linux-ld: drivers/soc/fsl/qe/qmc.o: in function `qmc_probe':
>>> qmc.c:(.text.qmc_probe+0xd8): undefined reference to `get_immrbase'
> 

I guess a dependency on CONFIG_FSL_SOC needs to be added.

Christophe


Re: [PATCH] powerpc/rtas: Drop unused export symbols

2023-02-15 Thread Michael Ellerman
On Fri, 27 Jan 2023 22:12:31 +1100, Michael Ellerman wrote:
> Some RTAS symbols are never used by modular code, drop their exports.
> 
> 

Applied to powerpc/next.

[1/1] powerpc/rtas: Drop unused export symbols
  https://git.kernel.org/powerpc/c/0d7e812fd282bf248b54523cc550a34b77c2e9a2

cheers


Re: [PATCH v4 06/10] soc: fsl: cmp1: Add support for QMC

2023-02-15 Thread kernel test robot
Hi Herve,

I love your patch! Yet something to improve:

[auto build test ERROR on broonie-sound/for-next]
[also build test ERROR on robh/for-next powerpc/next powerpc/fixes linus/master 
v6.2-rc8 next-20230215]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Herve-Codina/dt-bindings-soc-fsl-cpm_qe-Add-TSA-controller/20230128-152424
base:   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 
for-next
patch link:
https://lore.kernel.org/r/20230126083222.374243-7-herve.codina%40bootlin.com
patch subject: [PATCH v4 06/10] soc: fsl: cmp1: Add support for QMC
config: powerpc-allyesconfig 
(https://download.01.org/0day-ci/archive/20230215/202302152037.nxhi2afy-...@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/20ec2eacb76ca7252aa2934f53357663652edd0f
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Herve-Codina/dt-bindings-soc-fsl-cpm_qe-Add-TSA-controller/20230128-152424
git checkout 20ec2eacb76ca7252aa2934f53357663652edd0f
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 
O=build_dir ARCH=powerpc olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 
O=build_dir ARCH=powerpc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Link: 
https://lore.kernel.org/oe-kbuild-all/202302152037.nxhi2afy-...@intel.com/

All errors (new ones prefixed by >>):

   powerpc-linux-ld: drivers/soc/fsl/qe/qmc.o: in function `qmc_probe':
>> qmc.c:(.text.qmc_probe+0xd8): undefined reference to `get_immrbase'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests


Re: [PATCH v3 0/5] powerpc: Add KCSAN support

2023-02-15 Thread Michael Ellerman
On Mon, 6 Feb 2023 13:17:56 +1100, Rohan McLure wrote:
> Add Kernel Concurrency Sanitiser support for PPC64. Doing so involves
> exclusion of a number of compilation units from instrumentation, as was
> done with KASAN.
> 
> KCSAN uses watchpoints on memory accesses to enforce the semantics of
> the Linux kernel memory model, notifying the user of observed data races
> which have not been declared to be intended in source through the
> data_race() macro, in order to remove false positives.
> 
> [...]

Applied to powerpc/next.

[1/5] powerpc: kcsan: Add exclusions from instrumentation
  https://git.kernel.org/powerpc/c/2fb857bc9f9e106439017ed323f522cc785395bb
[2/5] powerpc: kcsan: Exclude udelay to prevent recursive instrumentation
  https://git.kernel.org/powerpc/c/2a7ce82dc46c591c9244057d89a6591c9639b9b9
[3/5] powerpc: kcsan: Memory barriers semantics
  https://git.kernel.org/powerpc/c/b6e259297a6bffb882d55715284bb5219eefda42
[4/5] powerpc: kcsan: Prevent recursive instrumentation with IRQ save/restores
  https://git.kernel.org/powerpc/c/4f8e09106f6e457c6e9a4ce597fa9ae2bda032c3
[5/5] powerpc: kcsan: Add KCSAN Support
  https://git.kernel.org/powerpc/c/6f0926c00565a91f3bd7ca1aa05db307daed5e0f

cheers


  1   2   >