date:20240505

Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

2024-05-05 Thread maobibo





On 2024/5/6 上午9:49, Huacai Chen wrote:

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:


Physical cpuid is used for interrupt routing for irqchips such as
ipi/msi/extioi interrupt controller. And physical cpuid is stored
at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
is created and physical cpuid of two vcpus cannot be the same.

Different irqchips have different size declaration about physical cpuid,
max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
for MSI irqchip.

The smallest value from all interrupt controllers is selected now,
and the max cpuid size is defines as 256 by KVM which comes from
extioi irqchip.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/include/asm/kvm_host.h | 26 
  arch/loongarch/include/asm/kvm_vcpu.h |  1 +
  arch/loongarch/kvm/vcpu.c | 93 ++-
  arch/loongarch/kvm/vm.c   | 11 
  4 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 2d62f7b0d377..3ba16ef1fe69 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -64,6 +64,30 @@ struct kvm_world_switch {

  #define MAX_PGTABLE_LEVELS 4

+/*
+ * Physical cpu id is used for interrupt routing, there are different
+ * definitions about physical cpuid on different hardwares.
+ *  For LOONGARCH_CSR_CPUID register, max cpuid size if 512
+ *  For IPI HW, max dest CPUID size 1024
+ *  For extioi interrupt controller, max dest CPUID size is 256
+ *  For MSI interrupt controller, max supported CPUID size is 65536
+ *
+ * Currently max CPUID is defined as 256 for KVM hypervisor, in future
+ * it will be expanded to 4096, including 16 packages at most. And every
+ * package supports at most 256 vcpus
+ */
+#define KVM_MAX_PHYID  256
+
+struct kvm_phyid_info {
+   struct kvm_vcpu *vcpu;
+   boolenabled;
+};
+
+struct kvm_phyid_map {
+   int max_phyid;
+   struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
+};
+
  struct kvm_arch {
 /* Guest physical mm */
 kvm_pte_t *pgd;
@@ -71,6 +95,8 @@ struct kvm_arch {
 unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
 unsigned int  pte_shifts[MAX_PGTABLE_LEVELS];
 unsigned int  root_level;
+   spinlock_tphyid_map_lock;
+   struct kvm_phyid_map  *phyid_map;

 s64 time_offset;
 struct kvm_context __percpu *vmcs;
diff --git a/arch/loongarch/include/asm/kvm_vcpu.h 
b/arch/loongarch/include/asm/kvm_vcpu.h
index 0cb4fdb8a9b5..9f53950959da 100644
--- a/arch/loongarch/include/asm/kvm_vcpu.h
+++ b/arch/loongarch/include/asm/kvm_vcpu.h
@@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
  void kvm_restore_timer(struct kvm_vcpu *vcpu);

  int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt 
*irq);
+struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);

  /*
   * Loongarch KVM guest interrupt handling
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 3a8779065f73..b633fd28b8db 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int 
id, u64 *val)
 return 0;
  }

+static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
+{
+   int cpuid;
+   struct loongarch_csrs *csr = vcpu->arch.csr;
+   struct kvm_phyid_map  *map;
+
+   if (val >= KVM_MAX_PHYID)
+   return -EINVAL;
+
+   cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
+   map = vcpu->kvm->arch.phyid_map;
+   spin_lock(&vcpu->kvm->arch.phyid_map_lock);
+   if (map->phys_map[cpuid].enabled) {
+   /*
+* Cpuid is already set before
+* Forbid changing different cpuid at runtime
+*/
+   if (cpuid != val) {
+   /*
+* Cpuid 0 is initial value for vcpu, maybe invalid
+* unset value for vcpu
+*/
+   if (cpuid) {
+   spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+   return -EINVAL;
+   }
+   } else {
+/* Discard duplicated cpuid set */
+   spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+   return 0;
+   }
+   }

I have changed the logic and comments when I apply, you can double
check whether it is correct.
I checkout the latest version, the modification in function 
kvm_set_cpuid() is good for me.



+
+   if (map->phys_map[val].enabled) {
+   /*
+* New cpuid is already set with other vcpu
+* Forbid sharing the same cpuid between different

Re: [PATCH v8 0/6] LoongArch: Add pv ipi support on LoongArch VM

2024-05-05 Thread maobibo





On 2024/5/6 上午9:45, Huacai Chen wrote:

Hi, Bibo,

I have done an off-list discussion with some KVM experts, and they
think user-space have its right to know PV features, so cpucfg
solution is acceptable.

And I applied this series with some modifications at
https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/log/?h=loongarch-kvm
You can test it now. But it seems the upstream qemu cannot enable PV IPI now.
VM with 128/256 vcpus boots with this series in loongarch-kvm branch. 
And pv ipi works by information "cat /proc/interrupts". There need small 
modification with qemu like this, and we

will submit the patch to qemu after it is merged.

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 441d764843..9f7556cd93 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -15,6 +15,8 @@
 #include "sysemu/runstate.h"
 #include "sysemu/reset.h"
 #include "sysemu/rtc.h"
+#include "sysemu/tcg.h"
+#include "sysemu/kvm.h"
 #include "hw/loongarch/virt.h"
 #include "exec/address-spaces.h"
 #include "hw/irq.h"
@@ -786,12 +788,18 @@ static void loongarch_qemu_write(void *opaque, 
hwaddr addr,


 static uint64_t loongarch_qemu_read(void *opaque, hwaddr addr, 
unsigned size)

 {
+uint64_t ret = 0;
+
 switch (addr) {
 case VERSION_REG:
 return 0x11ULL;
 case FEATURE_REG:
-return 1ULL << IOCSRF_MSI | 1ULL << IOCSRF_EXTIOI |
+ret =1ULL << IOCSRF_MSI | 1ULL << IOCSRF_EXTIOI |
1ULL << IOCSRF_CSRIPI;
+if (kvm_enabled()) {
+ret |= 1ULL << IOCSRF_VM;
+}
+   return ret;
 case VENDOR_REG:
 return 0x6e6f73676e6f6f4cULL; /* "Loongson" */
 case CPUNAME_REG:


Regards
Bibo Mao


I will reply to other patches about my modifications.

Huacai

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:


On physical machine, ipi HW uses IOCSR registers, however there is trap
into hypervisor when vcpu accesses IOCSR registers if system is in VM
mode. SWI is a interrupt mechanism like SGI on ARM, software can send
interrupt to CPU, only that on LoongArch SWI can only be sent to local CPU
now. So SWI can not used for IPI on real HW system, however it can be used
on VM when combined with hypercall method. IPI can be sent with hypercall
method and SWI interrupt is injected to vcpu, vcpu can treat SWI
interrupt as IPI.

With PV IPI supported, there is one trap with IPI sending, however with IPI
receiving there is no trap. with IOCSR HW ipi method, there will be one
trap with IPI sending and two trap with ipi receiving.

Also IPI multicast support is added for VM, the idea comes from x86 PV ipi.
IPI can be sent to 128 vcpus in one time. With IPI multicast support, trap
will be reduced greatly.

Here is the microbenchmarck data with "perf bench futex wake" testcase on
3C5000 single-way machine, there are 16 cpus on 3C5000 single-way machine,
VM has 16 vcpus also. The benchmark data is ms time unit to wakeup 16
threads, the performance is better if data is smaller.

physical machine 0.0176 ms
VM original  0.1140 ms
VM with pv ipi patch 0.0481 ms

It passes to boot with 128/256 vcpus, and passes to run runltp command
with package ltp-20230516.

---
v7 --- v8:
  1. Remove kernel PLV mode checking with cpucfg emulation for hypervisor
feature inquiry.
  2. Remove document about loongarch hypercall ABI per request of huacai,
will add English/Chinese doc at the same time in later.

v6 --- v7:
   1. Refine LoongArch virt document by review comments.
   2. Add function kvm_read_reg()/kvm_write_reg() in hypercall emulation,
and later it can be used for other trap emulations.

v5 --- v6:
   1. Add privilege checking when emulating cpucfg at index 0x400 --
0x40FF, return 0 if not executed at kernel mode.
   2. Add document about LoongArch pv ipi with new creatly directory
Documentation/virt/kvm/loongarch/
   3. Fix pv ipi handling in kvm backend function kvm_pv_send_ipi(),
where min should plus BITS_PER_LONG with second bitmap, otherwise
VM with more than 64 vpus fails to boot.
   4. Adjust patch order and code refine with review comments.

v4 --- v5:
   1. Refresh function/macro name from review comments.

v3 --- v4:
   1. Modfiy pv ipi hook function name call_func_ipi() and
call_func_single_ipi() with send_ipi_mask()/send_ipi_single(), since pv
ipi is used for both remote function call and reschedule notification.
   2. Refresh changelog.

v2 --- v3:
   1. Add 128 vcpu ipi multicast support like x86
   2. Change cpucfg base address from 0x1000 to 0x4000, in order
to avoid confliction with future hw usage
   3. Adjust patch order in this patchset, move patch
Refine-ipi-ops-on-LoongArch-platform to the first one.

v1 --- v2:
   1. Add hw cpuid map support since ipi routing uses hw cpuid
   2. Refine changelog description
   3. Add hypercall statistic support for vcpu
   4. Set percpu pv ipi message buffer aligned with cacheline
   5. Refine pv ipi

[PATCH] kernel/module: disable cfi for do_mod_ctors

2024-05-05 Thread Joey Jiao

CFI failure when both CONFIG_CONSTRUCTORS and CFI_CLANG enabled.

CFI failure at do_init_module+0x100/0x384 (target:
tsan.module_ctor+0x0/0xa98 [module_name_xx]; expected type: 0xa540670c)

Disable cfi for do_mod_ctors to avoid cfi check on mod->ctors[i]().

Signed-off-by: Joey Jiao 
---
 kernel/module/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index e1e8a7a9d6c1..d51e63795637 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2453,6 +2453,7 @@ static int post_relocation(struct module *mod, const 
struct load_info *info)
 }
 
 /* Call module constructors. */
+__nocfi
 static void do_mod_ctors(struct module *mod)
 {
 #ifdef CONFIG_CONSTRUCTORS
-- 
2.43.2

Re: [PATCH] virtio_net: Warn if insufficient queue length for transmitting

2024-05-05 Thread Jason Wang

On Wed, May 1, 2024 at 4:07 AM Michael S. Tsirkin  wrote:
>
> On Tue, Apr 30, 2024 at 03:35:09PM -0400, Darius Rad wrote:
> > The transmit queue is stopped when the number of free queue entries is less
> > than 2+MAX_SKB_FRAGS, in start_xmit().  If the queue length (QUEUE_NUM_MAX)
> > is less than then this, transmission will immediately trigger a netdev
> > watchdog timeout.  Report this condition earlier and more directly.
> >
> > Signed-off-by: Darius Rad 
> > ---
> >  drivers/net/virtio_net.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 115c3c5414f2..72ee8473b61c 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -4917,6 +4917,9 @@ static int virtnet_probe(struct virtio_device *vdev)
> >   set_bit(guest_offloads[i], &vi->guest_offloads);
> >   vi->guest_offloads_capable = vi->guest_offloads;
> >
> > + if (virtqueue_get_vring_size(vi->sq->vq) < 2 + MAX_SKB_FRAGS)
> > + netdev_warn_once(dev, "not enough queue entries, expect xmit 
> > timeout\n");
> > +
>
> How about actually fixing it though? E.g. by linearizing...

Actually, the linearing is only needed for the case when the indirect
descriptor is not supported.

>
> It also bothers me that there's practically
> /proc/sys/net/core/max_skb_frags
> and if that's low then things could actually work.

Probably not as it won't exceed MAX_SKB_FRAGS.

>
> Finally, while originally it was just 17 typically, now it's
> configurable. So it's possible that you change the config to make big
> tcp

Note that virtio-net doesn't fully support big TCP.

> work better and device stops working while it worked fine
> previously.

For this patch, I guess not as we had:

if (sq->vq->num_free < 2+MAX_SKB_FRAGS)

in the tx path. So it won't even work before this patch.

Thanks

>
>
> >   pr_debug("virtnet: registered device %s with %d RX and TX vq's\n",
> >dev->name, max_queue_pairs);
> >
> > --
> > 2.39.2
>

Re: [PATCH v8 2/6] LoongArch: KVM: Add hypercall instruction emulation support

2024-05-05 Thread maobibo





On 2024/5/6 上午9:54, Huacai Chen wrote:

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:


On LoongArch system, there is hypercall instruction special for
virtualization. When system executes this instruction on host side,
there is illegal instruction exception reported, however it will
trap into host when it is executed in VM mode.

When hypercall is emulated, A0 register is set with value
KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid
instruction exception. So VM can continue to executing the next code.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/include/asm/Kbuild  |  1 -
  arch/loongarch/include/asm/kvm_para.h  | 26 ++
  arch/loongarch/include/uapi/asm/Kbuild |  2 --
  arch/loongarch/kvm/exit.c  | 10 ++
  4 files changed, 36 insertions(+), 3 deletions(-)
  create mode 100644 arch/loongarch/include/asm/kvm_para.h
  delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild

diff --git a/arch/loongarch/include/asm/Kbuild 
b/arch/loongarch/include/asm/Kbuild
index 2dbec7853ae8..c862672ed953 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -26,4 +26,3 @@ generic-y += poll.h
  generic-y += param.h
  generic-y += posix_types.h
  generic-y += resource.h
-generic-y += kvm_para.h
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
new file mode 100644
index ..d48f993ae206
--- /dev/null
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_KVM_PARA_H
+#define _ASM_LOONGARCH_KVM_PARA_H
+
+/*
+ * LoongArch hypercall return code
+ */
+#define KVM_HCALL_STATUS_SUCCESS   0
+#define KVM_HCALL_INVALID_CODE -1UL
+#define KVM_HCALL_INVALID_PARAMETER-2UL
+
+static inline unsigned int kvm_arch_para_features(void)
+{
+   return 0;
+}
+
+static inline unsigned int kvm_arch_para_hints(void)
+{
+   return 0;
+}
+
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+   return false;
+}
+#endif /* _ASM_LOONGARCH_KVM_PARA_H */
diff --git a/arch/loongarch/include/uapi/asm/Kbuild 
b/arch/loongarch/include/uapi/asm/Kbuild
deleted file mode 100644
index 4aa680ca2e5f..
--- a/arch/loongarch/include/uapi/asm/Kbuild
+++ /dev/null
@@ -1,2 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-generic-y += kvm_para.h

This file shouldn't be removed.
yes, uapi kvm_param.h is needed for Loongarch, and there will be problem 
if it is removed. And it should kept unchanged.


Regards
Bibo Mao


Huacai


diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index ed1d89d53e2e..923bbca9bd22 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -685,6 +685,15 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
 return RESUME_GUEST;
  }

+static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
+{
+   update_pc(&vcpu->arch);
+
+   /* Treat it as noop intruction, only set return value */
+   vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
+   return RESUME_GUEST;
+}
+
  /*
   * LoongArch KVM callback handling for unimplemented guest exiting
   */
@@ -716,6 +725,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] = 
{
 [EXCCODE_LSXDIS]= kvm_handle_lsx_disabled,
 [EXCCODE_LASXDIS]   = kvm_handle_lasx_disabled,
 [EXCCODE_GSPR]  = kvm_handle_gspr,
+   [EXCCODE_HVC]   = kvm_handle_hypercall,
  };

  int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault)
--
2.39.3

Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

2024-05-05 Thread maobibo


Huacai,

Many thanks for reviewing pv ipi patchset.
And I reply inline.

On 2024/5/6 上午9:49, Huacai Chen wrote:

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:


Physical cpuid is used for interrupt routing for irqchips such as
ipi/msi/extioi interrupt controller. And physical cpuid is stored
at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
is created and physical cpuid of two vcpus cannot be the same.

Different irqchips have different size declaration about physical cpuid,
max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
for MSI irqchip.

The smallest value from all interrupt controllers is selected now,
and the max cpuid size is defines as 256 by KVM which comes from
extioi irqchip.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/include/asm/kvm_host.h | 26 
  arch/loongarch/include/asm/kvm_vcpu.h |  1 +
  arch/loongarch/kvm/vcpu.c | 93 ++-
  arch/loongarch/kvm/vm.c   | 11 
  4 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 2d62f7b0d377..3ba16ef1fe69 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -64,6 +64,30 @@ struct kvm_world_switch {

  #define MAX_PGTABLE_LEVELS 4

+/*
+ * Physical cpu id is used for interrupt routing, there are different
+ * definitions about physical cpuid on different hardwares.
+ *  For LOONGARCH_CSR_CPUID register, max cpuid size if 512
+ *  For IPI HW, max dest CPUID size 1024
+ *  For extioi interrupt controller, max dest CPUID size is 256
+ *  For MSI interrupt controller, max supported CPUID size is 65536
+ *
+ * Currently max CPUID is defined as 256 for KVM hypervisor, in future
+ * it will be expanded to 4096, including 16 packages at most. And every
+ * package supports at most 256 vcpus
+ */
+#define KVM_MAX_PHYID  256
+
+struct kvm_phyid_info {
+   struct kvm_vcpu *vcpu;
+   boolenabled;
+};
+
+struct kvm_phyid_map {
+   int max_phyid;
+   struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
+};
+
  struct kvm_arch {
 /* Guest physical mm */
 kvm_pte_t *pgd;
@@ -71,6 +95,8 @@ struct kvm_arch {
 unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
 unsigned int  pte_shifts[MAX_PGTABLE_LEVELS];
 unsigned int  root_level;
+   spinlock_tphyid_map_lock;
+   struct kvm_phyid_map  *phyid_map;

 s64 time_offset;
 struct kvm_context __percpu *vmcs;
diff --git a/arch/loongarch/include/asm/kvm_vcpu.h 
b/arch/loongarch/include/asm/kvm_vcpu.h
index 0cb4fdb8a9b5..9f53950959da 100644
--- a/arch/loongarch/include/asm/kvm_vcpu.h
+++ b/arch/loongarch/include/asm/kvm_vcpu.h
@@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
  void kvm_restore_timer(struct kvm_vcpu *vcpu);

  int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt 
*irq);
+struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);

  /*
   * Loongarch KVM guest interrupt handling
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 3a8779065f73..b633fd28b8db 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int 
id, u64 *val)
 return 0;
  }

+static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
+{
+   int cpuid;
+   struct loongarch_csrs *csr = vcpu->arch.csr;
+   struct kvm_phyid_map  *map;
+
+   if (val >= KVM_MAX_PHYID)
+   return -EINVAL;
+
+   cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
+   map = vcpu->kvm->arch.phyid_map;
+   spin_lock(&vcpu->kvm->arch.phyid_map_lock);
+   if (map->phys_map[cpuid].enabled) {
+   /*
+* Cpuid is already set before
+* Forbid changing different cpuid at runtime
+*/
+   if (cpuid != val) {
+   /*
+* Cpuid 0 is initial value for vcpu, maybe invalid
+* unset value for vcpu
+*/
+   if (cpuid) {
+   spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+   return -EINVAL;
+   }
+   } else {
+/* Discard duplicated cpuid set */
+   spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
+   return 0;
+   }
+   }

I have changed the logic and comments when I apply, you can double
check whether it is correct.

Will do.




+
+   if (map->phys_map[val].enabled) {
+   /*
+* New cpuid is already set with other vcpu
+* Forbid sharing the same cpuid between different vcpus
+

Re: [PATCH] livepatch.h: Add comment to klp transition state

2024-05-05 Thread zhang warden




> On May 6, 2024, at 05:00, Josh Poimboeuf  wrote:
> 
> On Mon, Apr 29, 2024 at 03:26:28PM +0800, zhangwar...@gmail.com wrote:
>> From: Wardenjohn 
>> 
>> livepatch.h use KLP_UNDEFINED\KLP_UNPATCHED\KLP_PATCHED for klp transition 
>> state.
>> When livepatch is ready but idle, using KLP_UNDEFINED seems very confusing.
>> In order not to introduce potential risks to kernel, just update comment
>> to these state.
>> ---
>> include/linux/livepatch.h | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
>> index 9b9b38e89563..b6a214f2f8e3 100644
>> --- a/include/linux/livepatch.h
>> +++ b/include/linux/livepatch.h
>> @@ -18,9 +18,9 @@
>> #if IS_ENABLED(CONFIG_LIVEPATCH)
>> 
>> /* task patch states */
>> -#define KLP_UNDEFINED -1
>> -#define KLP_UNPATCHED  0
>> -#define KLP_PATCHED  1
>> +#define KLP_UNDEFINED -1 /* idle, no transition in progress */
>> +#define KLP_UNPATCHED  0 /* transitioning to unpatched state */
>> +#define KLP_PATCHED  1 /* transitioning to patched state */
> 
> Instead of the comments, how about we just rename them to
> 
>  KLP_TRANSITION_IDLE
>  KLP_TRANSITION_UNPATCHED
>  KLP_TRANSITION_PATCHED
> 
> which shouldn't break userspace AFAIK.
> 
> -- 
> Josh

Hi Josh!

Renaming them may be a better way as my previous patch. I would like to know 
why renaming KLP_*** into 
KLP_TRANSITION_*** will not break userspace while 
Renaming KLP_UNDEWFINED to KLP_IDLE would break the userspace.

Meanwhile, I will resubmit another patch renaming the macros as your suggestion 
ASAP. Thank ~~ :)

--
Wardenjohn

Re: [PATCH v8 2/6] LoongArch: KVM: Add hypercall instruction emulation support

2024-05-05 Thread Huacai Chen

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:
>
> On LoongArch system, there is hypercall instruction special for
> virtualization. When system executes this instruction on host side,
> there is illegal instruction exception reported, however it will
> trap into host when it is executed in VM mode.
>
> When hypercall is emulated, A0 register is set with value
> KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid
> instruction exception. So VM can continue to executing the next code.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/include/asm/Kbuild  |  1 -
>  arch/loongarch/include/asm/kvm_para.h  | 26 ++
>  arch/loongarch/include/uapi/asm/Kbuild |  2 --
>  arch/loongarch/kvm/exit.c  | 10 ++
>  4 files changed, 36 insertions(+), 3 deletions(-)
>  create mode 100644 arch/loongarch/include/asm/kvm_para.h
>  delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild
>
> diff --git a/arch/loongarch/include/asm/Kbuild 
> b/arch/loongarch/include/asm/Kbuild
> index 2dbec7853ae8..c862672ed953 100644
> --- a/arch/loongarch/include/asm/Kbuild
> +++ b/arch/loongarch/include/asm/Kbuild
> @@ -26,4 +26,3 @@ generic-y += poll.h
>  generic-y += param.h
>  generic-y += posix_types.h
>  generic-y += resource.h
> -generic-y += kvm_para.h
> diff --git a/arch/loongarch/include/asm/kvm_para.h 
> b/arch/loongarch/include/asm/kvm_para.h
> new file mode 100644
> index ..d48f993ae206
> --- /dev/null
> +++ b/arch/loongarch/include/asm/kvm_para.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_KVM_PARA_H
> +#define _ASM_LOONGARCH_KVM_PARA_H
> +
> +/*
> + * LoongArch hypercall return code
> + */
> +#define KVM_HCALL_STATUS_SUCCESS   0
> +#define KVM_HCALL_INVALID_CODE -1UL
> +#define KVM_HCALL_INVALID_PARAMETER-2UL
> +
> +static inline unsigned int kvm_arch_para_features(void)
> +{
> +   return 0;
> +}
> +
> +static inline unsigned int kvm_arch_para_hints(void)
> +{
> +   return 0;
> +}
> +
> +static inline bool kvm_check_and_clear_guest_paused(void)
> +{
> +   return false;
> +}
> +#endif /* _ASM_LOONGARCH_KVM_PARA_H */
> diff --git a/arch/loongarch/include/uapi/asm/Kbuild 
> b/arch/loongarch/include/uapi/asm/Kbuild
> deleted file mode 100644
> index 4aa680ca2e5f..
> --- a/arch/loongarch/include/uapi/asm/Kbuild
> +++ /dev/null
> @@ -1,2 +0,0 @@
> -# SPDX-License-Identifier: GPL-2.0
> -generic-y += kvm_para.h
This file shouldn't be removed.

Huacai

> diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
> index ed1d89d53e2e..923bbca9bd22 100644
> --- a/arch/loongarch/kvm/exit.c
> +++ b/arch/loongarch/kvm/exit.c
> @@ -685,6 +685,15 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu 
> *vcpu)
> return RESUME_GUEST;
>  }
>
> +static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
> +{
> +   update_pc(&vcpu->arch);
> +
> +   /* Treat it as noop intruction, only set return value */
> +   vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HCALL_INVALID_CODE;
> +   return RESUME_GUEST;
> +}
> +
>  /*
>   * LoongArch KVM callback handling for unimplemented guest exiting
>   */
> @@ -716,6 +725,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] 
> = {
> [EXCCODE_LSXDIS]= kvm_handle_lsx_disabled,
> [EXCCODE_LASXDIS]   = kvm_handle_lasx_disabled,
> [EXCCODE_GSPR]  = kvm_handle_gspr,
> +   [EXCCODE_HVC]   = kvm_handle_hypercall,
>  };
>
>  int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault)
> --
> 2.39.3
>
>

Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-05-05 Thread Huacai Chen

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:
>
> PARAVIRT option and pv ipi is added on guest kernel side, function
> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> it will call function kvm_para_available() to detect current hypervirsor
> type. Now only KVM type detection is supported, the paravirt function can
> work only if current hypervisor type is KVM, since there is only KVM
> supported on LoongArch now.
>
> PV IPI uses virtual IPI sender and virtual IPI receiver function. With
> virutal IPI sender, ipi message is stored in DDR memory rather than
> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
> at the same time like X86 KVM method. Hypercall method is used for IPI
> sending.
>
> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
> acknowledge. And IPI message is stored in DDR, no trap in get IPI message.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/Kconfig|   9 ++
>  arch/loongarch/include/asm/hardirq.h  |   1 +
>  arch/loongarch/include/asm/paravirt.h |  27 
>  .../include/asm/paravirt_api_clock.h  |   1 +
>  arch/loongarch/kernel/Makefile|   1 +
>  arch/loongarch/kernel/irq.c   |   2 +-
>  arch/loongarch/kernel/paravirt.c  | 151 ++
>  arch/loongarch/kernel/smp.c   |   4 +-
>  8 files changed, 194 insertions(+), 2 deletions(-)
>  create mode 100644 arch/loongarch/include/asm/paravirt.h
>  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>  create mode 100644 arch/loongarch/kernel/paravirt.c
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 54ad04dacdee..0a1540a8853e 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
> bool
> default y
>
> +config PARAVIRT
> +   bool "Enable paravirtualization code"
> +   depends on AS_HAS_LVZ_EXTENSION
> +   help
> +  This changes the kernel so it can modify itself when it is run
> + under a hypervisor, potentially improving performance significantly
> + over full virtualization.  However, when run without a hypervisor
> + the kernel is theoretically slower and slightly larger.
> +
>  config ARCH_SUPPORTS_KEXEC
> def_bool y
>
> diff --git a/arch/loongarch/include/asm/hardirq.h 
> b/arch/loongarch/include/asm/hardirq.h
> index 9f0038e19c7f..b26d596a73aa 100644
> --- a/arch/loongarch/include/asm/hardirq.h
> +++ b/arch/loongarch/include/asm/hardirq.h
> @@ -21,6 +21,7 @@ enum ipi_msg_type {
>  typedef struct {
> unsigned int ipi_irqs[NR_IPI];
> unsigned int __softirq_pending;
> +   atomic_t message cacheline_aligned_in_smp;
>  } cacheline_aligned irq_cpustat_t;
>
>  DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> diff --git a/arch/loongarch/include/asm/paravirt.h 
> b/arch/loongarch/include/asm/paravirt.h
> new file mode 100644
> index ..58f7b7b89f2c
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> +#define _ASM_LOONGARCH_PARAVIRT_H
> +
> +#ifdef CONFIG_PARAVIRT
> +#include 
> +struct static_key;
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 dummy_steal_clock(int cpu);
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> +   return static_call(pv_steal_clock)(cpu);
> +}
> +
> +int pv_ipi_init(void);
> +#else
> +static inline int pv_ipi_init(void)
> +{
> +   return 0;
> +}
> +
> +#endif // CONFIG_PARAVIRT
> +#endif
> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
> b/arch/loongarch/include/asm/paravirt_api_clock.h
> new file mode 100644
> index ..65ac7cee0dad
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> @@ -0,0 +1 @@
> +#include 
> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> index 3a7620b66bc6..c9bfeda89e40 100644
> --- a/arch/loongarch/kernel/Makefile
> +++ b/arch/loongarch/kernel/Makefile
> @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
>  obj-$(CONFIG_STACKTRACE)   += stacktrace.o
>
>  obj-$(CONFIG_PROC_FS)  += proc.o
> +obj-$(CONFIG_PARAVIRT) += paravirt.o
>
>  obj-$(CONFIG_SMP)  += smp.o
>
> diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
> index ce36897d1e5a..4863e6c1b739 100644
> --- a/arch/loongarch/kernel/irq.c
> +++ b/arch/loongarch/kernel/irq.c
> @@ -113,5 +113,5 @@ void __init init_IRQ(void)
> per_cpu(irq_sta

Re: [PATCH v8 4/6] LoongArch: KVM: Add vcpu search support from physical cpuid

2024-05-05 Thread Huacai Chen

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:
>
> Physical cpuid is used for interrupt routing for irqchips such as
> ipi/msi/extioi interrupt controller. And physical cpuid is stored
> at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
> is created and physical cpuid of two vcpus cannot be the same.
>
> Different irqchips have different size declaration about physical cpuid,
> max cpuid value for CSR LOONGARCH_CSR_CPUID on 3A5000 is 512, max cpuid
> supported by IPI hardware is 1024, 256 for extioi irqchip, and 65536
> for MSI irqchip.
>
> The smallest value from all interrupt controllers is selected now,
> and the max cpuid size is defines as 256 by KVM which comes from
> extioi irqchip.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/include/asm/kvm_host.h | 26 
>  arch/loongarch/include/asm/kvm_vcpu.h |  1 +
>  arch/loongarch/kvm/vcpu.c | 93 ++-
>  arch/loongarch/kvm/vm.c   | 11 
>  4 files changed, 130 insertions(+), 1 deletion(-)
>
> diff --git a/arch/loongarch/include/asm/kvm_host.h 
> b/arch/loongarch/include/asm/kvm_host.h
> index 2d62f7b0d377..3ba16ef1fe69 100644
> --- a/arch/loongarch/include/asm/kvm_host.h
> +++ b/arch/loongarch/include/asm/kvm_host.h
> @@ -64,6 +64,30 @@ struct kvm_world_switch {
>
>  #define MAX_PGTABLE_LEVELS 4
>
> +/*
> + * Physical cpu id is used for interrupt routing, there are different
> + * definitions about physical cpuid on different hardwares.
> + *  For LOONGARCH_CSR_CPUID register, max cpuid size if 512
> + *  For IPI HW, max dest CPUID size 1024
> + *  For extioi interrupt controller, max dest CPUID size is 256
> + *  For MSI interrupt controller, max supported CPUID size is 65536
> + *
> + * Currently max CPUID is defined as 256 for KVM hypervisor, in future
> + * it will be expanded to 4096, including 16 packages at most. And every
> + * package supports at most 256 vcpus
> + */
> +#define KVM_MAX_PHYID  256
> +
> +struct kvm_phyid_info {
> +   struct kvm_vcpu *vcpu;
> +   boolenabled;
> +};
> +
> +struct kvm_phyid_map {
> +   int max_phyid;
> +   struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
> +};
> +
>  struct kvm_arch {
> /* Guest physical mm */
> kvm_pte_t *pgd;
> @@ -71,6 +95,8 @@ struct kvm_arch {
> unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
> unsigned int  pte_shifts[MAX_PGTABLE_LEVELS];
> unsigned int  root_level;
> +   spinlock_tphyid_map_lock;
> +   struct kvm_phyid_map  *phyid_map;
>
> s64 time_offset;
> struct kvm_context __percpu *vmcs;
> diff --git a/arch/loongarch/include/asm/kvm_vcpu.h 
> b/arch/loongarch/include/asm/kvm_vcpu.h
> index 0cb4fdb8a9b5..9f53950959da 100644
> --- a/arch/loongarch/include/asm/kvm_vcpu.h
> +++ b/arch/loongarch/include/asm/kvm_vcpu.h
> @@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
>  void kvm_restore_timer(struct kvm_vcpu *vcpu);
>
>  int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt 
> *irq);
> +struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
>
>  /*
>   * Loongarch KVM guest interrupt handling
> diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
> index 3a8779065f73..b633fd28b8db 100644
> --- a/arch/loongarch/kvm/vcpu.c
> +++ b/arch/loongarch/kvm/vcpu.c
> @@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned 
> int id, u64 *val)
> return 0;
>  }
>
> +static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
> +{
> +   int cpuid;
> +   struct loongarch_csrs *csr = vcpu->arch.csr;
> +   struct kvm_phyid_map  *map;
> +
> +   if (val >= KVM_MAX_PHYID)
> +   return -EINVAL;
> +
> +   cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
> +   map = vcpu->kvm->arch.phyid_map;
> +   spin_lock(&vcpu->kvm->arch.phyid_map_lock);
> +   if (map->phys_map[cpuid].enabled) {
> +   /*
> +* Cpuid is already set before
> +* Forbid changing different cpuid at runtime
> +*/
> +   if (cpuid != val) {
> +   /*
> +* Cpuid 0 is initial value for vcpu, maybe invalid
> +* unset value for vcpu
> +*/
> +   if (cpuid) {
> +   spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> +   return -EINVAL;
> +   }
> +   } else {
> +/* Discard duplicated cpuid set */
> +   spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
> +   return 0;
> +   }
> +   }
I have changed the logic and comments when I apply, you can double
check whether it is correct.

> +
> +   if (map->phys_map[val].enabled) {
> +   /*
> +* New cpuid is already set w

Re: [PATCH v8 0/6] LoongArch: Add pv ipi support on LoongArch VM

2024-05-05 Thread Huacai Chen

Hi, Bibo,

I have done an off-list discussion with some KVM experts, and they
think user-space have its right to know PV features, so cpucfg
solution is acceptable.

And I applied this series with some modifications at
https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/log/?h=loongarch-kvm
You can test it now. But it seems the upstream qemu cannot enable PV IPI now.

I will reply to other patches about my modifications.

Huacai

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:
>
> On physical machine, ipi HW uses IOCSR registers, however there is trap
> into hypervisor when vcpu accesses IOCSR registers if system is in VM
> mode. SWI is a interrupt mechanism like SGI on ARM, software can send
> interrupt to CPU, only that on LoongArch SWI can only be sent to local CPU
> now. So SWI can not used for IPI on real HW system, however it can be used
> on VM when combined with hypercall method. IPI can be sent with hypercall
> method and SWI interrupt is injected to vcpu, vcpu can treat SWI
> interrupt as IPI.
>
> With PV IPI supported, there is one trap with IPI sending, however with IPI
> receiving there is no trap. with IOCSR HW ipi method, there will be one
> trap with IPI sending and two trap with ipi receiving.
>
> Also IPI multicast support is added for VM, the idea comes from x86 PV ipi.
> IPI can be sent to 128 vcpus in one time. With IPI multicast support, trap
> will be reduced greatly.
>
> Here is the microbenchmarck data with "perf bench futex wake" testcase on
> 3C5000 single-way machine, there are 16 cpus on 3C5000 single-way machine,
> VM has 16 vcpus also. The benchmark data is ms time unit to wakeup 16
> threads, the performance is better if data is smaller.
>
> physical machine 0.0176 ms
> VM original  0.1140 ms
> VM with pv ipi patch 0.0481 ms
>
> It passes to boot with 128/256 vcpus, and passes to run runltp command
> with package ltp-20230516.
>
> ---
> v7 --- v8:
>  1. Remove kernel PLV mode checking with cpucfg emulation for hypervisor
> feature inquiry.
>  2. Remove document about loongarch hypercall ABI per request of huacai,
> will add English/Chinese doc at the same time in later.
>
> v6 --- v7:
>   1. Refine LoongArch virt document by review comments.
>   2. Add function kvm_read_reg()/kvm_write_reg() in hypercall emulation,
> and later it can be used for other trap emulations.
>
> v5 --- v6:
>   1. Add privilege checking when emulating cpucfg at index 0x400 --
> 0x40FF, return 0 if not executed at kernel mode.
>   2. Add document about LoongArch pv ipi with new creatly directory
> Documentation/virt/kvm/loongarch/
>   3. Fix pv ipi handling in kvm backend function kvm_pv_send_ipi(),
> where min should plus BITS_PER_LONG with second bitmap, otherwise
> VM with more than 64 vpus fails to boot.
>   4. Adjust patch order and code refine with review comments.
>
> v4 --- v5:
>   1. Refresh function/macro name from review comments.
>
> v3 --- v4:
>   1. Modfiy pv ipi hook function name call_func_ipi() and
> call_func_single_ipi() with send_ipi_mask()/send_ipi_single(), since pv
> ipi is used for both remote function call and reschedule notification.
>   2. Refresh changelog.
>
> v2 --- v3:
>   1. Add 128 vcpu ipi multicast support like x86
>   2. Change cpucfg base address from 0x1000 to 0x4000, in order
> to avoid confliction with future hw usage
>   3. Adjust patch order in this patchset, move patch
> Refine-ipi-ops-on-LoongArch-platform to the first one.
>
> v1 --- v2:
>   1. Add hw cpuid map support since ipi routing uses hw cpuid
>   2. Refine changelog description
>   3. Add hypercall statistic support for vcpu
>   4. Set percpu pv ipi message buffer aligned with cacheline
>   5. Refine pv ipi send logic, do not send ipi message with if there is
> pending ipi message.
> ---
> Bibo Mao (6):
>   LoongArch/smp: Refine some ipi functions on LoongArch platform
>   LoongArch: KVM: Add hypercall instruction emulation support
>   LoongArch: KVM: Add cpucfg area for kvm hypervisor
>   LoongArch: KVM: Add vcpu search support from physical cpuid
>   LoongArch: KVM: Add pv ipi support on kvm side
>   LoongArch: Add pv ipi support on guest kernel side
>
>  arch/loongarch/Kconfig|   9 +
>  arch/loongarch/include/asm/Kbuild |   1 -
>  arch/loongarch/include/asm/hardirq.h  |   5 +
>  arch/loongarch/include/asm/inst.h |   1 +
>  arch/loongarch/include/asm/irq.h  |  10 +-
>  arch/loongarch/include/asm/kvm_host.h |  27 +++
>  arch/loongarch/include/asm/kvm_para.h | 155 ++
>  arch/loongarch/include/asm/kvm_vcpu.h |  11 ++
>  arch/loongarch/include/asm/loongarch.h|  11 ++
>  arch/loongarch/include/asm/paravirt.h |  27 +++
>  .../include/asm/paravirt_api_clock.h  |   1 +
>  arch/loongarch/include/asm/smp.h  |  31 ++--
>  arch/loongarch/include/uapi/asm/Kbuild

Re: [PATCH v6 3/5] regulator: add regulators driver for Marvell 88PM886 PMIC

2024-05-05 Thread Mark Brown

On Sun, May 05, 2024 at 08:52:06PM +0200, Karel Balej wrote:

> Should I then drop this op and the max_uA values from all the
> regulators?

Probably, yes.


signature.asc
Description: PGP signature

Re: [PATCH 2/2] drivers: remoteproc: xlnx: add sram support

2024-05-05 Thread kernel test robot

Hi Tanmay,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 0496190c4d42965acb31b9da1b6dac3509791062]

url:
https://github.com/intel-lab-lkp/linux/commits/Tanmay-Shah/drivers-remoteproc-xlnx-add-attach-detach-support/20240503-071225
base:   0496190c4d42965acb31b9da1b6dac3509791062
patch link:
https://lore.kernel.org/r/20240502231021.370047-3-tanmay.shah%40amd.com
patch subject: [PATCH 2/2] drivers: remoteproc: xlnx: add sram support
config: arm64-randconfig-r113-20240506 
(https://download.01.org/0day-ci/archive/20240506/202405060759.yyzluqxp-...@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 13.2.0
reproduce: 
(https://download.01.org/0day-ci/archive/20240506/202405060759.yyzluqxp-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202405060759.yyzluqxp-...@intel.com/

sparse warnings: (new ones prefixed by >>)
   drivers/remoteproc/xlnx_r5_remoteproc.c:423:20: sparse: sparse: cast removes 
address space '__iomem' of expression
   drivers/remoteproc/xlnx_r5_remoteproc.c:604:20: sparse: sparse: cast removes 
address space '__iomem' of expression
   drivers/remoteproc/xlnx_r5_remoteproc.c:827:21: sparse: sparse: incorrect 
type in assignment (different address spaces) @@ expected struct 
rsc_tbl_data *rsc_data_va @@ got void [noderef] __iomem * @@
   drivers/remoteproc/xlnx_r5_remoteproc.c:827:21: sparse: expected struct 
rsc_tbl_data *rsc_data_va
   drivers/remoteproc/xlnx_r5_remoteproc.c:827:21: sparse: got void 
[noderef] __iomem *
   drivers/remoteproc/xlnx_r5_remoteproc.c:844:18: sparse: sparse: incorrect 
type in assignment (different address spaces) @@ expected struct 
resource_table *rsc_addr @@ got void [noderef] __iomem * @@
   drivers/remoteproc/xlnx_r5_remoteproc.c:844:18: sparse: expected struct 
resource_table *rsc_addr
   drivers/remoteproc/xlnx_r5_remoteproc.c:844:18: sparse: got void 
[noderef] __iomem *
   drivers/remoteproc/xlnx_r5_remoteproc.c:898:24: sparse: sparse: incorrect 
type in argument 1 (different address spaces) @@ expected void volatile 
[noderef] __iomem *addr @@ got struct resource_table *rsc_tbl_va @@
   drivers/remoteproc/xlnx_r5_remoteproc.c:898:24: sparse: expected void 
volatile [noderef] __iomem *addr
   drivers/remoteproc/xlnx_r5_remoteproc.c:898:24: sparse: got struct 
resource_table *rsc_tbl_va
>> drivers/remoteproc/xlnx_r5_remoteproc.c:995:26: sparse: sparse: Using plain 
>> integer as NULL pointer

vim +995 drivers/remoteproc/xlnx_r5_remoteproc.c

   798  
   799  static int zynqmp_r5_get_rsc_table_va(struct zynqmp_r5_core *r5_core)
   800  {
   801  struct device *dev = r5_core->dev;
   802  struct rsc_tbl_data *rsc_data_va;
   803  struct resource_table *rsc_addr;
   804  struct resource res_mem;
   805  struct device_node *np;
   806  int ret;
   807  
   808  /**
   809   * It is expected from remote processor firmware to provide 
resource
   810   * table address via struct rsc_tbl_data data structure.
   811   * Start address of first entry under "memory-region" property 
list
   812   * contains that data structure which holds resource table 
address, size
   813   * and some magic number to validate correct resource table 
entry.
   814   */
   815  np = of_parse_phandle(r5_core->np, "memory-region", 0);
   816  if (!np) {
   817  dev_err(dev, "failed to get memory region dev node\n");
   818  return -EINVAL;
   819  }
   820  
   821  ret = of_address_to_resource(np, 0, &res_mem);
   822  if (ret) {
   823  dev_err(dev, "failed to get memory-region resource 
addr\n");
   824  return -EINVAL;
   825  }
   826  
 > 827  rsc_data_va = devm_ioremap_wc(dev, res_mem.start,
   828sizeof(struct rsc_tbl_data));
   829  if (!rsc_data_va) {
   830  dev_err(dev, "failed to map resource table data 
address\n");
   831  return -EIO;
   832  }
   833  
   834  /**
   835   * If RSC_TBL_XLNX_MAGIC number and its complement isn't found 
then
   836   * do not consider resource table address valid and don't attach
   837   */
   838  if (rsc_data_va->magic_num != RSC_TBL_XLNX_MAGIC ||
   839  rsc_data_va->comp_magic_num != ~RSC_TBL_XLNX_MAGIC) {
   840  dev_dbg(dev, "invalid magic number, won't attach\n");
   841  return -EINVAL;
   842  }
   843  
   844  rsc_addr = ioremap_wc(rsc_data_va->rsc_tbl,
   845rsc_data_va->rsc_tbl_size);
   846  if (!rsc

Re: [PATCH 1/2] drivers: remoteproc: xlnx: add attach detach support

2024-05-05 Thread kernel test robot

Hi Tanmay,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 0496190c4d42965acb31b9da1b6dac3509791062]

url:
https://github.com/intel-lab-lkp/linux/commits/Tanmay-Shah/drivers-remoteproc-xlnx-add-attach-detach-support/20240503-071225
base:   0496190c4d42965acb31b9da1b6dac3509791062
patch link:
https://lore.kernel.org/r/20240502231021.370047-2-tanmay.shah%40amd.com
patch subject: [PATCH 1/2] drivers: remoteproc: xlnx: add attach detach support
config: arm64-randconfig-r113-20240506 
(https://download.01.org/0day-ci/archive/20240506/202405060611.jbqbf7ib-...@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 13.2.0
reproduce: 
(https://download.01.org/0day-ci/archive/20240506/202405060611.jbqbf7ib-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202405060611.jbqbf7ib-...@intel.com/

sparse warnings: (new ones prefixed by >>)
   drivers/remoteproc/xlnx_r5_remoteproc.c:404:20: sparse: sparse: cast removes 
address space '__iomem' of expression
   drivers/remoteproc/xlnx_r5_remoteproc.c:522:20: sparse: sparse: cast removes 
address space '__iomem' of expression
>> drivers/remoteproc/xlnx_r5_remoteproc.c:731:21: sparse: sparse: incorrect 
>> type in assignment (different address spaces) @@ expected struct 
>> rsc_tbl_data *rsc_data_va @@ got void [noderef] __iomem * @@
   drivers/remoteproc/xlnx_r5_remoteproc.c:731:21: sparse: expected struct 
rsc_tbl_data *rsc_data_va
   drivers/remoteproc/xlnx_r5_remoteproc.c:731:21: sparse: got void 
[noderef] __iomem *
>> drivers/remoteproc/xlnx_r5_remoteproc.c:748:18: sparse: sparse: incorrect 
>> type in assignment (different address spaces) @@ expected struct 
>> resource_table *rsc_addr @@ got void [noderef] __iomem * @@
   drivers/remoteproc/xlnx_r5_remoteproc.c:748:18: sparse: expected struct 
resource_table *rsc_addr
   drivers/remoteproc/xlnx_r5_remoteproc.c:748:18: sparse: got void 
[noderef] __iomem *
>> drivers/remoteproc/xlnx_r5_remoteproc.c:802:24: sparse: sparse: incorrect 
>> type in argument 1 (different address spaces) @@ expected void volatile 
>> [noderef] __iomem *addr @@ got struct resource_table *rsc_tbl_va @@
   drivers/remoteproc/xlnx_r5_remoteproc.c:802:24: sparse: expected void 
volatile [noderef] __iomem *addr
   drivers/remoteproc/xlnx_r5_remoteproc.c:802:24: sparse: got struct 
resource_table *rsc_tbl_va

vim +731 drivers/remoteproc/xlnx_r5_remoteproc.c

   702  
   703  static int zynqmp_r5_get_rsc_table_va(struct zynqmp_r5_core *r5_core)
   704  {
   705  struct device *dev = r5_core->dev;
   706  struct rsc_tbl_data *rsc_data_va;
   707  struct resource_table *rsc_addr;
   708  struct resource res_mem;
   709  struct device_node *np;
   710  int ret;
   711  
   712  /**
   713   * It is expected from remote processor firmware to provide 
resource
   714   * table address via struct rsc_tbl_data data structure.
   715   * Start address of first entry under "memory-region" property 
list
   716   * contains that data structure which holds resource table 
address, size
   717   * and some magic number to validate correct resource table 
entry.
   718   */
   719  np = of_parse_phandle(r5_core->np, "memory-region", 0);
   720  if (!np) {
   721  dev_err(dev, "failed to get memory region dev node\n");
   722  return -EINVAL;
   723  }
   724  
   725  ret = of_address_to_resource(np, 0, &res_mem);
   726  if (ret) {
   727  dev_err(dev, "failed to get memory-region resource 
addr\n");
   728  return -EINVAL;
   729  }
   730  
 > 731  rsc_data_va = devm_ioremap_wc(dev, res_mem.start,
   732sizeof(struct rsc_tbl_data));
   733  if (!rsc_data_va) {
   734  dev_err(dev, "failed to map resource table data 
address\n");
   735  return -EIO;
   736  }
   737  
   738  /**
   739   * If RSC_TBL_XLNX_MAGIC number and its complement isn't found 
then
   740   * do not consider resource table address valid and don't attach
   741   */
   742  if (rsc_data_va->magic_num != RSC_TBL_XLNX_MAGIC ||
   743  rsc_data_va->comp_magic_num != ~RSC_TBL_XLNX_MAGIC) {
   744  dev_dbg(dev, "invalid magic number, won't attach\n");
   745  return -EINVAL;
   746  }
   747  
 > 748  rsc_addr = ioremap_wc(rsc_data_va->rsc_tbl,
   749rsc_data_va->rsc_tbl_size);
   750  if (!rsc_addr) {
   751  dev_err(dev, "failed to get rsc_addr\n");
   75

Re: [PATCH] livepatch.h: Add comment to klp transition state

2024-05-05 Thread Josh Poimboeuf

On Mon, Apr 29, 2024 at 03:26:28PM +0800, zhangwar...@gmail.com wrote:
> From: Wardenjohn 
> 
> livepatch.h use KLP_UNDEFINED\KLP_UNPATCHED\KLP_PATCHED for klp transition 
> state.
> When livepatch is ready but idle, using KLP_UNDEFINED seems very confusing.
> In order not to introduce potential risks to kernel, just update comment
> to these state.
> ---
>  include/linux/livepatch.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
> index 9b9b38e89563..b6a214f2f8e3 100644
> --- a/include/linux/livepatch.h
> +++ b/include/linux/livepatch.h
> @@ -18,9 +18,9 @@
>  #if IS_ENABLED(CONFIG_LIVEPATCH)
>  
>  /* task patch states */
> -#define KLP_UNDEFINED-1
> -#define KLP_UNPATCHED 0
> -#define KLP_PATCHED   1
> +#define KLP_UNDEFINED-1 /* idle, no transition in progress */
> +#define KLP_UNPATCHED 0 /* transitioning to unpatched state */
> +#define KLP_PATCHED   1 /* transitioning to patched state */

Instead of the comments, how about we just rename them to

  KLP_TRANSITION_IDLE
  KLP_TRANSITION_UNPATCHED
  KLP_TRANSITION_PATCHED

which shouldn't break userspace AFAIK.

-- 
Josh

Re: [PATCH virt] virt: fix uninit-value in vhost_vsock_dev_open

2024-05-05 Thread Arseniy Krasnov

> But now that it's explained, the bugfix as proposed is incomplete:
> userspace can set features twice and the second time will leak
> old VIRTIO_VSOCK_F_SEQPACKET bit value.
> 
> And I am pretty sure the Fixes tag is wrong.
> 
> So I wrote this, but I actually don't have a set for
> seqpacket to test this. Arseny could you help test maybe?
> Thanks!

Hi! Sorry for late reply! Just run vsock test suite with this patch -
seems everything is ok!

> 
> 
> commit bcc17a060d93b198d8a17a9b87b593f41337ee28
> Author: Michael S. Tsirkin 
> Date:   Mon Apr 22 10:03:13 2024 -0400
> 
> vhost/vsock: always initialize seqpacket_allow
> 
> There are two issues around seqpacket_allow:
> 1. seqpacket_allow is not initialized when socket is
> created. Thus if features are never set, it will be
> read uninitialized.
> 2. if VIRTIO_VSOCK_F_SEQPACKET is set and then cleared,
> then seqpacket_allow will not be cleared appropriately
> (existing apps I know about don't usually do this but
> it's legal and there's no way to be sure no one relies
> on this).
> 
> To fix:
> - initialize seqpacket_allow after allocation
> - set it unconditionally in set_features
> 
> Reported-by: syzbot+6c21aeb59d0e82eb2...@syzkaller.appspotmail.com
> Reported-by: Jeongjun Park 
> Fixes: ced7b713711f ("vhost/vsock: support SEQPACKET for transport").
> Cc: Arseny Krasnov 
> Cc: David S. Miller 
> Cc: Stefan Hajnoczi 
> Signed-off-by: Michael S. Tsirkin 

Acked-by: Arseniy Krasnov 

> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index ec20ecff85c7..bf664ec9341b 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -667,6 +667,7 @@ static int vhost_vsock_dev_open(struct inode *inode, 
> struct file *file)
>   }
>  
>   vsock->guest_cid = 0; /* no CID assigned yet */
> + vsock->seqpacket_allow = false;
>  
>   atomic_set(&vsock->queued_replies, 0);
>  
> @@ -810,8 +811,7 @@ static int vhost_vsock_set_features(struct vhost_vsock 
> *vsock, u64 features)
>   goto err;
>   }
>  
> - if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> - vsock->seqpacket_allow = true;
> + vsock->seqpacket_allow = features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET);
>  
>   for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
>   vq = &vsock->vqs[i];

Re: [PATCH RESEND v8 00/16] mm: jit/text allocator

2024-05-05 Thread Luis Chamberlain

On Sun, May 05, 2024 at 07:06:12PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> The patches are also available in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v8
> 
> v8:
> * fix intialization of default_execmem_info

Thanks, applied and pushed to modules-next. If we find fixes, let's
please just now have separate patches on top of this series.

  Luis

Re: [PATCH v6 3/5] regulator: add regulators driver for Marvell 88PM886 PMIC

2024-05-05 Thread Karel Balej

Mark Brown, 2024-05-06T00:15:01+09:00:
> On Sat, May 04, 2024 at 09:37:06PM +0200, Karel Balej wrote:
>
> > +static const struct regulator_ops pm886_ldo_ops = {
> > +   .list_voltage = regulator_list_voltage_table,
> > +   .map_voltage = regulator_map_voltage_iterate,
> > +   .set_voltage_sel = regulator_set_voltage_sel_regmap,
> > +   .get_voltage_sel = regulator_get_voltage_sel_regmap,
> > +   .enable = regulator_enable_regmap,
> > +   .disable = regulator_disable_regmap,
> > +   .is_enabled = regulator_is_enabled_regmap,
> > +   .get_current_limit = pm886_regulator_get_ilim,
>
> Do these regulators actually enforce this limit or is this just a spec
> limit beyond which regulation may fail?  If it's just a spec limit I'd
> not expect this operation to be provided, it's more for a hard limit
> where the regulator will detect and act on issues.  I don't see an error
> interrupt or anything and this would be an unusual feature for a LDO.

I'm afraid I don't have the answer -- my only reference is the vendor
version of the driver and I don't see anything there based on which I
would be able to tell.

But based on what you write, my guess would be that it's just a spec limit.

Should I then drop this op and the max_uA values from all the
regulators?

Thank you,
K. B.

Re: [PATCH v8 00/17] mm: jit/text allocator

2024-05-05 Thread Mike Rapoport

This is embarrassing, but these patches were from a wrong branch :(
Please ignore.

On Sun, May 05, 2024 at 05:25:43PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> The patches are also available in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v8
> 
> v8:
> * fix intialization of default_execmem_info
> 
> v7: https://lore.kernel.org/all/20240429121620.1186447-1-r...@kernel.org
> * define MODULE_{VADDR,END} for riscv32 to fix the build and avoid
>   #ifdefs in a function body
> * add Acks, thanks everybody
> 
> v6: https://lore.kernel.org/all/20240426082854.7355-1-r...@kernel.org
> * restore patch "arm64: extend execmem_info for generated code
>   allocations" that disappeared in v5 rebase
> * update execmem initialization so that by default it will be
>   initialized early while late initialization will be an opt-in
> 
> v5: https://lore.kernel.org/all/20240422094436.3625171-1-r...@kernel.org
> * rebase on v6.9-rc4 to avoid a conflict in kprobes
> * add copyrights to mm/execmem.c (Luis)
> * fix spelling (Ingo)
> * define MODULES_VADDDR for sparc (Sam)
> * consistently initialize struct execmem_info (Peter)
> * reduce #ifdefs in function bodies in kprobes (Masami) 
> 
> v4: https://lore.kernel.org/all/20240411160051.2093261-1-r...@kernel.org
> * rebase on v6.9-rc2
> * rename execmem_params to execmem_info and execmem_arch_params() to
>   execmem_arch_setup()
> * use single execmem_alloc() API instead of execmem_{text,data}_alloc() (Song)
> * avoid extra copy of execmem parameters (Rick)
> * run execmem_init() as core_initcall() except for the architectures that
>   may allocated text really early (currently only x86) (Will)
> * add acks for some of arm64 and riscv changes, thanks Will and Alexandre
> * new commits:
>   - drop call to kasan_alloc_module_shadow() on arm64 because it's not
> needed anymore
>   - rename MODULE_START to MODULES_VADDR on MIPS
>   - use CONFIG_EXECMEM instead of CONFIG_MODULES on powerpc as per Christophe:
> 
> https://lore.kernel.org/all/79062fa3-3402-47b3-8920-9231ad05e...@csgroup.eu/
> 
> v3: https://lore.kernel.org/all/20230918072955.2507221-1-r...@kernel.org
> * add type parameter to execmem allocation APIs
> * remove BPF dependency on modules
> 
> v2: https://lore.kernel.org/all/20230616085038.4121892-1-r...@kernel.org
> * Separate "module" and "others" allocations with execmem_text_alloc()
> and jit_text_alloc()
> * Drop ROX entailment on x86
> * Add ack for nios2 changes, thanks Dinh Nguyen
> 
> v1: https://lore.kernel.org/all/20230601101257.530867-1-r...@kernel.org
> 
> = Cover letter from v1 (sligtly updated) =
> 
> module_alloc() is used everywhere as a mean to allocate memory for code.
> 
> Beside being semantically wrong, this unnecessarily ties all subsystmes
> that need to allocate code, such as ftrace, kprobes and BPF to modules and
> puts the burden of code allocation to the modules code.
> 
> Several architectures override module_alloc() because of various
> constraints where the executable memory can be located and this causes
> additional obstacles for improvements of code allocation.
> 
> A centralized infrastructure for code allocation allows allocations of
> executable memory as ROX, and future optimizations such as caching large
> pages for better iTLB performance and providing sub-page allocations for
> users that only need small jit code snippets.
> 
> Rick Edgecombe proposed perm_alloc extension to vmalloc [1] and Song Liu
> proposed execmem_alloc [2], but both these approaches were targeting BPF
> allocations and lacked the ground work to abstract executable allocations
> and split them from the modules core.
> 
> Thomas Gleixner suggested to express module allocation restrictions and
> requirements as struct mod_alloc_type_params [3] that would define ranges,
> protections and other parameters for different types of allocations used by
> modules and following that suggestion Song separated allocations of
> different types in modules (commit ac3b43283923 ("module: replace
> module_layout with module_memory")) and posted "Type aware module
> allocator" set [4].
> 
> I liked the idea of parametrising code allocation requirements as a
> structure, but I believe the original proposal and Song's module allocator
> was too module centric, so I came up with these patches.
> 
> This set splits code allocation from modules by introducing execmem_alloc()
> and and execmem_free(), APIs, replaces call sites of module_alloc() and
> module_memfree() with the new APIs and implements core text and related
> allocations in a central place.
> 
> Instead of architecture specific overrides for module_alloc(), the
> architectures that require non-default behaviour for text allocation must
> fill execmem_info structure and implement execmem_arch_setup() that returns
> a pointer to that structure. If an architecture does not implement
> execmem_arch_setup(), the defaults compatible with the current
> m

[syzbot] [bpf?] [trace?] general protection fault in bpf_get_attach_cookie_tracing

2024-05-05 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:a9e7715ce8b3 libbpf: Avoid casts from pointers to enums in..
git tree:   bpf-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=153c1dc498
kernel config:  https://syzkaller.appspot.com/x/.config?x=e8aa3e4736485e94
dashboard link: https://syzkaller.appspot.com/bug?extid=3ab78ff125b7979e45f9
compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
2.40
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=17d4b58898
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16cb047098

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/a6daa7801875/disk-a9e7715c.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/0d5b51385a69/vmlinux-a9e7715c.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/999297a08631/bzImage-a9e7715c.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3ab78ff125b7979e4...@syzkaller.appspotmail.com

general protection fault, probably for non-canonical address 
0xdc00:  [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x-0x0007]
CPU: 0 PID: 5082 Comm: syz-executor316 Not tainted 
6.9.0-rc5-syzkaller-01452-ga9e7715ce8b3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
03/27/2024
RIP: 0010:bpf_get_attach_cookie_tracing kernel/trace/bpf_trace.c:1179 
[inline]
RIP: 0010:bpf_get_attach_cookie_tracing+0x46/0x60 kernel/trace/bpf_trace.c:1174
Code: d3 03 00 48 81 c3 00 18 00 00 48 89 d8 48 c1 e8 03 42 80 3c 30 00 74 08 
48 89 df e8 54 b9 59 00 48 8b 1b 48 89 d8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 
89 df e8 3b b9 59 00 48 8b 03 5b 41 5e c3
RSP: 0018:c90002f9fba8 EFLAGS: 00010246
RAX:  RBX:  RCX: 888029575a00
RDX:  RSI: c9ace048 RDI: 
RBP: c90002f9fbc0 R08: 89938ae7 R09: 125e80a0
R10: dc00 R11: a950 R12: c90002f9fc80
R13: dc00 R14: dc00 R15: 
FS:  78992380() GS:8880b940() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2e3e9388 CR3: 791c2000 CR4: 003506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 bpf_prog_fe13437f26555f61+0x1a/0x1c
 bpf_dispatcher_nop_func include/linux/bpf.h:1243 [inline]
 __bpf_prog_run include/linux/filter.h:691 [inline]
 bpf_prog_run include/linux/filter.h:698 [inline]
 __bpf_prog_test_run_raw_tp+0x149/0x310 net/bpf/test_run.c:732
 bpf_prog_test_run_raw_tp+0x47b/0x6a0 net/bpf/test_run.c:772
 bpf_prog_test_run+0x33a/0x3b0 kernel/bpf/syscall.c:4286
 __sys_bpf+0x48d/0x810 kernel/bpf/syscall.c:5700
 __do_sys_bpf kernel/bpf/syscall.c:5789 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:5787 [inline]
 __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5787
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f53be8a0469
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7ffdcf680a08 EFLAGS: 0246 ORIG_RAX: 0141
RAX: ffda RBX: 7ffdcf680bd8 RCX: 7f53be8a0469
RDX: 000c RSI: 2080 RDI: 000a
RBP: 7f53be913610 R08:  R09: 7ffdcf680bd8
R10: 7f53be8dbae3 R11: 0246 R12: 0001
R13: 7ffdcf680bc8 R14: 0001 R15: 0001
 
Modules linked in:
---[ end trace  ]---
RIP: 0010:bpf_get_attach_cookie_tracing kernel/trace/bpf_trace.c:1179 
[inline]
RIP: 0010:bpf_get_attach_cookie_tracing+0x46/0x60 kernel/trace/bpf_trace.c:1174
Code: d3 03 00 48 81 c3 00 18 00 00 48 89 d8 48 c1 e8 03 42 80 3c 30 00 74 08 
48 89 df e8 54 b9 59 00 48 8b 1b 48 89 d8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 
89 df e8 3b b9 59 00 48 8b 03 5b 41 5e c3
RSP: 0018:c90002f9fba8 EFLAGS: 00010246
RAX:  RBX:  RCX: 888029575a00
RDX:  RSI: c9ace048 RDI: 
RBP: c90002f9fbc0 R08: 89938ae7 R09: 125e80a0
R10: dc00 R11: a950 R12: c90002f9fc80
R13: dc00 R14: dc00 R15: 
FS:  78992380() GS:8880b940() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2e3e9388 CR3: 791c2000 CR4: 003506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400

Code disass

[PATCH RESEND v8 16/16] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

BPF just-in-time compiler depended on CONFIG_MODULES because it used
module_alloc() to allocate memory for the generated code.

Since code allocations are now implemented with execmem, drop dependency of
CONFIG_BPF_JIT on CONFIG_MODULES and make it select CONFIG_EXECMEM.

Suggested-by: Björn Töpel 
Signed-off-by: Mike Rapoport (IBM) 
---
 kernel/bpf/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index bc25f5098a25..f999e4e0b344 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -43,7 +43,7 @@ config BPF_JIT
bool "Enable BPF Just In Time compiler"
depends on BPF
depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
-   depends on MODULES
+   select EXECMEM
help
  BPF programs are normally handled by a BPF interpreter. This option
  allows the kernel to generate native code when a program is loaded
-- 
2.43.0

[PATCH RESEND v8 15/16] kprobes: remove dependency on CONFIG_MODULES

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

kprobes depended on CONFIG_MODULES because it has to allocate memory for
code.

Since code allocations are now implemented with execmem, kprobes can be
enabled in non-modular kernels.

Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
dependency of CONFIG_KPROBES on CONFIG_MODULES.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Masami Hiramatsu (Google) 
---
 arch/Kconfig|  2 +-
 include/linux/module.h  |  9 ++
 kernel/kprobes.c| 55 +++--
 kernel/trace/trace_kprobe.c | 20 +-
 4 files changed, 63 insertions(+), 23 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 4fd0daa54e6c..caa459964f09 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -52,9 +52,9 @@ config GENERIC_ENTRY
 
 config KPROBES
bool "Kprobes"
-   depends on MODULES
depends on HAVE_KPROBES
select KALLSYMS
+   select EXECMEM
select TASKS_RCU if PREEMPTION
help
  Kprobes allows you to trap at almost any kernel address and
diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..ffa1c603163c 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -605,6 +605,11 @@ static inline bool module_is_live(struct module *mod)
return mod->state != MODULE_STATE_GOING;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+return mod->state == MODULE_STATE_COMING;
+}
+
 struct module *__module_text_address(unsigned long addr);
 struct module *__module_address(unsigned long addr);
 bool is_module_address(unsigned long addr);
@@ -857,6 +862,10 @@ void *dereference_module_function_descriptor(struct module 
*mod, void *ptr)
return ptr;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+   return false;
+}
 #endif /* CONFIG_MODULES */
 
 #ifdef CONFIG_SYSFS
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ddd7cdc16edf..ca2c6cbd42d2 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1588,7 +1588,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
}
 
/* Get module refcount and reject __init functions for loaded modules. 
*/
-   if (*probed_mod) {
+   if (IS_ENABLED(CONFIG_MODULES) && *probed_mod) {
/*
 * We must hold a refcount of the probed module while updating
 * its code to prohibit unexpected unloading.
@@ -1603,12 +1603,13 @@ static int check_kprobe_address_safe(struct kprobe *p,
 * kprobes in there.
 */
if (within_module_init((unsigned long)p->addr, *probed_mod) &&
-   (*probed_mod)->state != MODULE_STATE_COMING) {
+   !module_is_coming(*probed_mod)) {
module_put(*probed_mod);
*probed_mod = NULL;
ret = -ENOENT;
}
}
+
 out:
preempt_enable();
jump_label_unlock();
@@ -2488,24 +2489,6 @@ int kprobe_add_area_blacklist(unsigned long start, 
unsigned long end)
return 0;
 }
 
-/* Remove all symbols in given area from kprobe blacklist */
-static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
-{
-   struct kprobe_blacklist_entry *ent, *n;
-
-   list_for_each_entry_safe(ent, n, &kprobe_blacklist, list) {
-   if (ent->start_addr < start || ent->start_addr >= end)
-   continue;
-   list_del(&ent->list);
-   kfree(ent);
-   }
-}
-
-static void kprobe_remove_ksym_blacklist(unsigned long entry)
-{
-   kprobe_remove_area_blacklist(entry, entry + 1);
-}
-
 int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
   char *type, char *sym)
 {
@@ -2570,6 +2553,25 @@ static int __init populate_kprobe_blacklist(unsigned 
long *start,
return ret ? : arch_populate_kprobe_blacklist();
 }
 
+#ifdef CONFIG_MODULES
+/* Remove all symbols in given area from kprobe blacklist */
+static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
+{
+   struct kprobe_blacklist_entry *ent, *n;
+
+   list_for_each_entry_safe(ent, n, &kprobe_blacklist, list) {
+   if (ent->start_addr < start || ent->start_addr >= end)
+   continue;
+   list_del(&ent->list);
+   kfree(ent);
+   }
+}
+
+static void kprobe_remove_ksym_blacklist(unsigned long entry)
+{
+   kprobe_remove_area_blacklist(entry, entry + 1);
+}
+
 static void add_module_kprobe_blacklist(struct module *mod)
 {
unsigned long start, end;
@@ -2672,6 +2674,17 @@ static struct notifier_block kprobe_module_nb = {
.priority = 0
 };
 
+static int kprobe_register_module_notifier(void)
+{
+   return register_module_notifier(&kprobe_module_nb);
+}
+#el

[PATCH RESEND v8 14/16] powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where appropriate

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

There are places where CONFIG_MODULES guards the code that depends on
memory allocation being done with module_alloc().

Replace CONFIG_MODULES with CONFIG_EXECMEM in such places.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/include/asm/kasan.h | 2 +-
 arch/powerpc/kernel/head_8xx.S   | 4 ++--
 arch/powerpc/kernel/head_book3s_32.S | 6 +++---
 arch/powerpc/lib/code-patching.c | 2 +-
 arch/powerpc/mm/book3s32/mmu.c   | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..2e586733a464 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -285,7 +285,7 @@ config PPC
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
-   select KASAN_VMALLOCif KASAN && MODULES
+   select KASAN_VMALLOCif KASAN && EXECMEM
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_PAGE_SIZE
select MMU_GATHER_RCU_TABLE_FREE
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 365d2720097c..b5bbb94c51f6 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -19,7 +19,7 @@
 
 #define KASAN_SHADOW_SCALE_SHIFT   3
 
-#if defined(CONFIG_MODULES) && defined(CONFIG_PPC32)
+#if defined(CONFIG_EXECMEM) && defined(CONFIG_PPC32)
 #define KASAN_KERN_START   ALIGN_DOWN(PAGE_OFFSET - SZ_256M, SZ_256M)
 #else
 #define KASAN_KERN_START   PAGE_OFFSET
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 647b0b445e89..edc479a7c2bc 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -199,12 +199,12 @@ instruction_counter:
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r10, r11)
mtspr   SPRN_MD_EPN, r10
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
mfcrr11
compare_to_kernel_boundary r10, r10
 #endif
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
blt+3f
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index c1d89764dd22..57196883a00e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -419,14 +419,14 @@ InstructionTLBMiss:
  */
/* Get PTE (linux-style) and check access */
mfspr   r3,SPRN_IMISS
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
 #endif
mfspr   r2, SPRN_SDR1
li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
rlwinm  r2, r2, 28, 0xf000
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
li  r0, 3
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
@@ -442,7 +442,7 @@ InstructionTLBMiss:
andc.   r1,r1,r2/* check access & ~permission */
bne-InstructionAddressInvalid /* return if access not permitted */
/* Convert linux-style PTE to low word of PPC-style PTE */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
rlwimi  r2, r0, 0, 31, 31   /* userspace ? -> PP lsb */
 #endif
ori r1, r1, 0xe06   /* clear out reserved bits */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6ab46156cda..7af791446ddf 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -225,7 +225,7 @@ void __init poking_init(void)
 
 static unsigned long get_patch_pfn(void *addr)
 {
-   if (IS_ENABLED(CONFIG_MODULES) && is_vmalloc_or_module_addr(addr))
+   if (IS_ENABLED(CONFIG_EXECMEM) && is_vmalloc_or_module_addr(addr))
return vmalloc_to_pfn(addr);
else
return __pa_symbol(addr) >> PAGE_SHIFT;
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 100f999871bc..625fe7d08e06 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -184,7 +184,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
 
 static bool is_module_segment(unsigned long addr)
 {
-   if (!IS_ENABLED(CONFIG_MODULES))
+   if (!IS_ENABLED(CONFIG_EXECMEM))
return false;
if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M))
return false;
-- 
2.43.0

[PATCH RESEND v8 13/16] x86/ftrace: enable dynamic ftrace without CONFIG_MODULES

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Dynamic ftrace must allocate memory for code and this was impossible
without CONFIG_MODULES.

With execmem separated from the modules code, execmem_text_alloc() is
available regardless of CONFIG_MODULES.

Remove dependency of dynamic ftrace on CONFIG_MODULES and make
CONFIG_DYNAMIC_FTRACE select CONFIG_EXECMEM in Kconfig.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/x86/Kconfig |  1 +
 arch/x86/kernel/ftrace.c | 10 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4474bf32d0a4..f2917ccf4fb4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -34,6 +34,7 @@ config X86_64
select SWIOTLB
select ARCH_HAS_ELFCORE_COMPAT
select ZONE_DMA32
+   select EXECMEM if DYNAMIC_FTRACE
 
 config FORCE_DYNAMIC_FTRACE
def_bool y
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index c8ddb7abda7c..8da0e66ca22d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -261,8 +261,6 @@ void arch_ftrace_update_code(int command)
 /* Currently only x86_64 supports dynamic trampolines */
 #ifdef CONFIG_X86_64
 
-#ifdef CONFIG_MODULES
-/* Module allocation simplifies allocating memory for code */
 static inline void *alloc_tramp(unsigned long size)
 {
return execmem_alloc(EXECMEM_FTRACE, size);
@@ -271,14 +269,6 @@ static inline void tramp_free(void *tramp)
 {
execmem_free(tramp);
 }
-#else
-/* Trampolines can only be created if modules are supported */
-static inline void *alloc_tramp(unsigned long size)
-{
-   return NULL;
-}
-static inline void tramp_free(void *tramp) { }
-#endif
 
 /* Defined as markers to the end of the ftrace default trampolines */
 extern void ftrace_regs_caller_end(void);
-- 
2.43.0

[PATCH RESEND v8 12/16] arch: make execmem setup available regardless of CONFIG_MODULES

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

execmem does not depend on modules, on the contrary modules use
execmem.

To make execmem available when CONFIG_MODULES=n, for instance for
kprobes, split execmem_params initialization out from
arch/*/kernel/module.c and compile it when CONFIG_EXECMEM=y

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Philippe Mathieu-Daudé 
---
 arch/arm/kernel/module.c   |  43 --
 arch/arm/mm/init.c |  45 +++
 arch/arm64/kernel/module.c | 140 -
 arch/arm64/mm/init.c   | 140 +
 arch/loongarch/kernel/module.c |  19 -
 arch/loongarch/mm/init.c   |  21 +
 arch/mips/kernel/module.c  |  22 --
 arch/mips/mm/init.c|  23 ++
 arch/nios2/kernel/module.c |  20 -
 arch/nios2/mm/init.c   |  21 +
 arch/parisc/kernel/module.c|  20 -
 arch/parisc/mm/init.c  |  23 +-
 arch/powerpc/kernel/module.c   |  63 ---
 arch/powerpc/mm/mem.c  |  64 +++
 arch/riscv/kernel/module.c |  34 
 arch/riscv/mm/init.c   |  35 +
 arch/s390/kernel/module.c  |  27 ---
 arch/s390/mm/init.c|  30 +++
 arch/sparc/kernel/module.c |  19 -
 arch/sparc/mm/Makefile |   2 +
 arch/sparc/mm/execmem.c|  21 +
 arch/x86/kernel/module.c   |  27 ---
 arch/x86/mm/init.c |  29 +++
 23 files changed, 453 insertions(+), 435 deletions(-)
 create mode 100644 arch/sparc/mm/execmem.c

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index a98fdf6ff26c..677f218f7e84 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -12,57 +12,14 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
-#include 
 
 #include 
 #include 
 #include 
 #include 
 
-#ifdef CONFIG_XIP_KERNEL
-/*
- * The XIP kernel text is mapped in the module area for modules and
- * some other stuff to work without any indirect relocations.
- * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
- * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
- */
-#undef MODULES_VADDR
-#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
-#endif
-
-#ifdef CONFIG_MMU
-static struct execmem_info execmem_info __ro_after_init;
-
-struct execmem_info __init *execmem_arch_setup(void)
-{
-   unsigned long fallback_start = 0, fallback_end = 0;
-
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
-   fallback_start = VMALLOC_START;
-   fallback_end = VMALLOC_END;
-   }
-
-   execmem_info = (struct execmem_info){
-   .ranges = {
-   [EXECMEM_DEFAULT] = {
-   .start  = MODULES_VADDR,
-   .end= MODULES_END,
-   .pgprot = PAGE_KERNEL_EXEC,
-   .alignment = 1,
-   .fallback_start = fallback_start,
-   .fallback_end   = fallback_end,
-   },
-   },
-   };
-
-   return &execmem_info;
-}
-#endif
-
 bool module_init_section(const char *name)
 {
return strstarts(name, ".init") ||
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index e8c6f4be0ce1..5345d218899a 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -486,3 +487,47 @@ void free_initrd_mem(unsigned long start, unsigned long 
end)
free_reserved_area((void *)start, (void *)end, -1, "initrd");
 }
 #endif
+
+#ifdef CONFIG_EXECMEM
+
+#ifdef CONFIG_XIP_KERNEL
+/*
+ * The XIP kernel text is mapped in the module area for modules and
+ * some other stuff to work without any indirect relocations.
+ * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
+ * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
+ */
+#undef MODULES_VADDR
+#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
+#endif
+
+#ifdef CONFIG_MMU
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
+{
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+

[PATCH RESEND v8 11/16] powerpc: extend execmem_params for kprobes allocations

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

powerpc overrides kprobes::alloc_insn_page() to remove writable
permissions when STRICT_MODULE_RWX is on.

Add definition of EXECMEM_KRPOBES to execmem_params to allow using the
generic kprobes::alloc_insn_page() with the desired permissions.

As powerpc uses breakpoint instructions to inject kprobes, it does not
need to constrain kprobe allocations to the modules area and can use the
entire vmalloc address space.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/kernel/kprobes.c | 20 
 arch/powerpc/kernel/module.c  |  7 +++
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9fcd01bb2ce6..14c5ddec3056 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -126,26 +126,6 @@ kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long 
addr, unsigned long offse
return (kprobe_opcode_t *)(addr + offset);
 }
 
-void *alloc_insn_page(void)
-{
-   void *page;
-
-   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
-   if (!page)
-   return NULL;
-
-   if (strict_module_rwx_enabled()) {
-   int err = set_memory_rox((unsigned long)page, 1);
-
-   if (err)
-   goto error;
-   }
-   return page;
-error:
-   execmem_free(page);
-   return NULL;
-}
-
 int arch_prepare_kprobe(struct kprobe *p)
 {
int ret = 0;
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index ac80559015a3..2a23cf7e141b 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -94,6 +94,7 @@ static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
 {
+   pgprot_t kprobes_prot = strict_module_rwx_enabled() ? PAGE_KERNEL_ROX : 
PAGE_KERNEL_EXEC;
pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : 
PAGE_KERNEL_EXEC;
unsigned long fallback_start = 0, fallback_end = 0;
unsigned long start, end;
@@ -132,6 +133,12 @@ struct execmem_info __init *execmem_arch_setup(void)
.fallback_start = fallback_start,
.fallback_end   = fallback_end,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = kprobes_prot,
+   .alignment = 1,
+   },
[EXECMEM_MODULE_DATA] = {
.start  = VMALLOC_START,
.end= VMALLOC_END,
-- 
2.43.0

[PATCH RESEND v8 10/16] arm64: extend execmem_info for generated code allocations

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

The memory allocations for kprobes and BPF on arm64 can be placed
anywhere in vmalloc address space and currently this is implemented with
overrides of alloc_insn_page() and bpf_jit_alloc_exec() in arm64.

Define EXECMEM_KPROBES and EXECMEM_BPF ranges in arm64::execmem_info and
drop overrides of alloc_insn_page() and bpf_jit_alloc_exec().

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Will Deacon 
---
 arch/arm64/kernel/module.c | 12 
 arch/arm64/kernel/probes/kprobes.c |  7 ---
 arch/arm64/net/bpf_jit_comp.c  | 11 ---
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index b7a7a23f9f8f..a52240ea084b 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -146,6 +146,18 @@ struct execmem_info __init *execmem_arch_setup(void)
.fallback_start = fallback_start,
.fallback_end   = fallback_end,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL_ROX,
+   .alignment = 1,
+   },
+   [EXECMEM_BPF] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
},
};
 
diff --git a/arch/arm64/kernel/probes/kprobes.c 
b/arch/arm64/kernel/probes/kprobes.c
index 327855a11df2..4268678d0e86 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -129,13 +129,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
return 0;
 }
 
-void *alloc_insn_page(void)
-{
-   return __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS,
-   NUMA_NO_NODE, __builtin_return_address(0));
-}
-
 /* arm kprobe: install breakpoint in text */
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 122021f9bdfc..456f5af239fc 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1793,17 +1793,6 @@ u64 bpf_jit_alloc_exec_limit(void)
return VMALLOC_END - VMALLOC_START;
 }
 
-void *bpf_jit_alloc_exec(unsigned long size)
-{
-   /* Memory is intended to be executable, reset the pointer tag. */
-   return kasan_reset_tag(vmalloc(size));
-}
-
-void bpf_jit_free_exec(void *addr)
-{
-   return vfree(addr);
-}
-
 /* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool bpf_jit_supports_subprog_tailcalls(void)
 {
-- 
2.43.0

[PATCH RESEND v8 09/16] riscv: extend execmem_params for generated code allocations

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

The memory allocations for kprobes and BPF on RISC-V are not placed in
the modules area and these custom allocations are implemented with
overrides of alloc_insn_page() and  bpf_jit_alloc_exec().

Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END for
32 bit and slightly reorder execmem_params initialization to support both
32 and 64 bit variants, define EXECMEM_KPROBES and EXECMEM_BPF ranges in
riscv::execmem_params and drop overrides of alloc_insn_page() and
bpf_jit_alloc_exec().

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Alexandre Ghiti 
---
 arch/riscv/include/asm/pgtable.h   |  3 +++
 arch/riscv/kernel/module.c | 14 +-
 arch/riscv/kernel/probes/kprobes.c | 10 --
 arch/riscv/net/bpf_jit_core.c  | 13 -
 4 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 9f8ea0e33eb1..5f21814e438e 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -55,6 +55,9 @@
 #define MODULES_LOWEST_VADDR   (KERNEL_LINK_ADDR - SZ_2G)
 #define MODULES_VADDR  (PFN_ALIGN((unsigned long)&_end) - SZ_2G)
 #define MODULES_END(PFN_ALIGN((unsigned long)&_start))
+#else
+#define MODULES_VADDR  VMALLOC_START
+#define MODULES_ENDVMALLOC_END
 #endif
 
 /*
diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
index 182904127ba0..0e6415f00fca 100644
--- a/arch/riscv/kernel/module.c
+++ b/arch/riscv/kernel/module.c
@@ -906,7 +906,7 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
+#ifdef CONFIG_MMU
 static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
@@ -919,6 +919,18 @@ struct execmem_info __init *execmem_arch_setup(void)
.pgprot = PAGE_KERNEL,
.alignment = 1,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL_READ_EXEC,
+   .alignment = 1,
+   },
+   [EXECMEM_BPF] = {
+   .start  = BPF_JIT_REGION_START,
+   .end= BPF_JIT_REGION_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = PAGE_SIZE,
+   },
},
};
 
diff --git a/arch/riscv/kernel/probes/kprobes.c 
b/arch/riscv/kernel/probes/kprobes.c
index 2f08c14a933d..e64f2f3064eb 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -104,16 +104,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
return 0;
 }
 
-#ifdef CONFIG_MMU
-void *alloc_insn_page(void)
-{
-   return  __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
-GFP_KERNEL, PAGE_KERNEL_READ_EXEC,
-VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-__builtin_return_address(0));
-}
-#endif
-
 /* install breakpoint in text */
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index 6b3acac30c06..e238fdbd5dbc 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -219,19 +219,6 @@ u64 bpf_jit_alloc_exec_limit(void)
return BPF_JIT_REGION_SIZE;
 }
 
-void *bpf_jit_alloc_exec(unsigned long size)
-{
-   return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START,
-   BPF_JIT_REGION_END, GFP_KERNEL,
-   PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-}
-
-void bpf_jit_free_exec(void *addr)
-{
-   return vfree(addr);
-}
-
 void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
int ret;
-- 
2.43.0

[PATCH RESEND v8 08/16] mm/execmem, arch: convert remaining overrides of module_alloc to execmem

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Extend execmem parameters to accommodate more complex overrides of
module_alloc() by architectures.

This includes specification of a fallback range required by arm, arm64
and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for
allocation of KASAN shadow required by s390 and x86 and support for
late initialization of execmem required by arm64.

The core implementation of execmem_alloc() takes care of suppressing
warnings when the initial allocation fails but there is a fallback range
defined.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Will Deacon 
Acked-by: Song Liu 
Tested-by: Liviu Dudau 
---
 arch/Kconfig |  8 
 arch/arm/kernel/module.c | 41 
 arch/arm64/Kconfig   |  1 +
 arch/arm64/kernel/module.c   | 55 +++
 arch/powerpc/kernel/module.c | 60 +++--
 arch/s390/kernel/module.c| 54 +++---
 arch/x86/kernel/module.c | 70 +++---
 include/linux/execmem.h  | 30 ++-
 include/linux/moduleloader.h | 12 --
 kernel/module/main.c | 26 +++--
 mm/execmem.c | 74 ++--
 11 files changed, 246 insertions(+), 185 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 65afb1de48b3..4fd0daa54e6c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -960,6 +960,14 @@ config ARCH_WANTS_MODULES_DATA_IN_VMALLOC
  For architectures like powerpc/32 which have constraints on module
  allocation and need to allocate module data outside of module area.
 
+config ARCH_WANTS_EXECMEM_LATE
+   bool
+   help
+ For architectures that do not allocate executable memory early on
+ boot, but rather require its initialization late when there is
+ enough entropy for module space randomization, for instance
+ arm64.
+
 config HAVE_IRQ_EXIT_ON_IRQ_STACK
bool
help
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index e74d84f58b77..a98fdf6ff26c 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -34,23 +35,31 @@
 #endif
 
 #ifdef CONFIG_MMU
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   gfp_t gfp_mask = GFP_KERNEL;
-   void *p;
-
-   /* Silence the initial allocation */
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
-   gfp_mask |= __GFP_NOWARN;
-
-   p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-   if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
-   return p;
-   return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+   },
+   },
+   };
+
+   return &execmem_info;
 }
 #endif
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..74b34a78b7ac 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -105,6 +105,7 @@ config ARM64
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES 
&& !ARM64_VA_BITS_36)
select ARCH_WANT_LD_ORPHAN_WARN
+   select ARCH_WANTS_EXECMEM_LATE if EXECMEM
select ARCH_WANTS_NO_INSTR
select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
select ARCH_HAS_UBSAN
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index e92da4da1b2a..b7a7a23f9f8f 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -108,41 +109,47 @@ static int __init module_init_limits(void)
 
return 0;
 }
-subsys_initcall(module_init_limits);
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *ex

[PATCH RESEND v8 07/16] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Several architectures override module_alloc() only to define address
range for code allocations different than VMALLOC address space.

Provide a generic implementation in execmem that uses the parameters for
address space ranges, required alignment and page protections provided
by architectures.

The architectures must fill execmem_info structure and implement
execmem_arch_setup() that returns a pointer to that structure. This way the
execmem initialization won't be called from every architecture, but rather
from a central place, namely a core_initcall() in execmem.

The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
with the parameters defined by the architectures.  If an architecture does
not implement execmem_arch_setup(), execmem_alloc() will fall back to
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Song Liu 
---
 arch/loongarch/kernel/module.c | 19 --
 arch/mips/kernel/module.c  | 20 --
 arch/nios2/kernel/module.c | 21 ---
 arch/parisc/kernel/module.c| 24 
 arch/riscv/kernel/module.c | 24 
 arch/sparc/kernel/module.c | 20 --
 include/linux/execmem.h| 47 
 mm/execmem.c   | 67 --
 mm/mm_init.c   |  2 +
 9 files changed, 210 insertions(+), 34 deletions(-)

diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
index c7d0338d12c1..ca6dd7ea1610 100644
--- a/arch/loongarch/kernel/module.c
+++ b/arch/loongarch/kernel/module.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -490,10 +491,22 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, 
__builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return &execmem_info;
 }
 
 static void module_init_ftrace_plt(const Elf_Ehdr *hdr,
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 9a6c96014904..59225a3cf918 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct mips_hi16 {
@@ -32,11 +33,22 @@ static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
 #ifdef MODULES_VADDR
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return &execmem_info;
 }
 #endif
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 9c97b7513853..0d1ee86631fc 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -18,15 +18,26 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC,
-   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return &execmem_info;
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
diff --g

[PATCH RESEND v8 06/16] mm: introduce execmem_alloc() and execmem_free()

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystems
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

Start splitting code allocation from modules by introducing execmem_alloc()
and execmem_free() APIs.

Initially, execmem_alloc() is a wrapper for module_alloc() and
execmem_free() is a replacement of module_memfree() to allow updating all
call sites to use the new APIs.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem_alloc() takes
a type argument, that will be used to identify the calling subsystem and to
allow architectures define parameters for ranges suitable for that
subsystem.

No functional changes.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Masami Hiramatsu (Google) 
Acked-by: Song Liu 
---
 arch/powerpc/kernel/kprobes.c|  6 ++--
 arch/s390/kernel/ftrace.c|  4 +--
 arch/s390/kernel/kprobes.c   |  4 +--
 arch/s390/kernel/module.c|  5 +--
 arch/sparc/net/bpf_jit_comp_32.c |  8 ++---
 arch/x86/kernel/ftrace.c |  6 ++--
 arch/x86/kernel/kprobes/core.c   |  4 +--
 include/linux/execmem.h  | 57 
 include/linux/moduleloader.h |  3 --
 kernel/bpf/core.c|  6 ++--
 kernel/kprobes.c |  8 ++---
 kernel/module/Kconfig|  1 +
 kernel/module/main.c | 25 +-
 mm/Kconfig   |  3 ++
 mm/Makefile  |  1 +
 mm/execmem.c | 32 ++
 16 files changed, 128 insertions(+), 45 deletions(-)
 create mode 100644 include/linux/execmem.h
 create mode 100644 mm/execmem.c

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index bbca90a5e2ec..9fcd01bb2ce6 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -19,8 +19,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -130,7 +130,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
 
@@ -142,7 +142,7 @@ void *alloc_insn_page(void)
}
return page;
 error:
-   module_memfree(page);
+   execmem_free(page);
return NULL;
 }
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..798249ef5646 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -7,13 +7,13 @@
  *   Author(s): Martin Schwidefsky 
  */
 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -220,7 +220,7 @@ static int __init ftrace_plt_init(void)
 {
const char *start, *end;
 
-   ftrace_plt = module_alloc(PAGE_SIZE);
+   ftrace_plt = execmem_alloc(EXECMEM_FTRACE, PAGE_SIZE);
if (!ftrace_plt)
panic("cannot allocate ftrace plt\n");
 
diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index f0cf20d4b3c5..3c1b1be744de 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -9,7 +9,6 @@
 
 #define pr_fmt(fmt) "kprobes: " fmt
 
-#include 
 #include 
 #include 
 #include 
@@ -21,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -38,7 +38,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
set_memory_rox((unsigned long)page, 1);
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 42215f9404af..ac97a905e8cd 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,7 +77,7 @@ void *module_alloc(unsigned long size)
 #ifdef CONFIG_FUNCTION_TRACER
 void module_arch_cleanup(struct module *mod)
 {
-   module_memfree(mod->arch.trampolines_start);
+   execmem_free(mod->arch.trampolines_start);
 }
 #endif
 
@@ -510,7 +511,7 @@ static int module_alloc_ftrace_hotpatch_trampolines(struct 
module *me,
 
size = FTRACE_HOTPATCH_TRAMPOLINES_SIZE(s->sh_size);
numpages = DIV_ROUND_UP(size, PAGE_SIZE);
-   start = module_alloc(numpages * PAGE_SIZE);
+   start = execmem_alloc(EXECMEM_FTRACE, numpages * PAGE_SIZE);
if (!start)

[PATCH RESEND v8 05/16] module: make module_memory_{alloc,free} more self-contained

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Move the logic related to the memory allocation and freeing into
module_memory_alloc() and module_memory_free().

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Philippe Mathieu-Daudé 
---
 kernel/module/main.c | 64 +++-
 1 file changed, 39 insertions(+), 25 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index e1e8a7a9d6c1..5b82b069e0d3 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1203,15 +1203,44 @@ static bool mod_mem_use_vmalloc(enum mod_mem_type type)
mod_mem_type_is_core_data(type);
 }
 
-static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
+static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
 {
+   unsigned int size = PAGE_ALIGN(mod->mem[type].size);
+   void *ptr;
+
+   mod->mem[type].size = size;
+
if (mod_mem_use_vmalloc(type))
-   return vzalloc(size);
-   return module_alloc(size);
+   ptr = vmalloc(size);
+   else
+   ptr = module_alloc(size);
+
+   if (!ptr)
+   return -ENOMEM;
+
+   /*
+* The pointer to these blocks of memory are stored on the module
+* structure and we keep that around so long as the module is
+* around. We only free that memory when we unload the module.
+* Just mark them as not being a leak then. The .init* ELF
+* sections *do* get freed after boot so we *could* treat them
+* slightly differently with kmemleak_ignore() and only grey
+* them out as they work as typical memory allocations which
+* *do* eventually get freed, but let's just keep things simple
+* and avoid *any* false positives.
+*/
+   kmemleak_not_leak(ptr);
+
+   memset(ptr, 0, size);
+   mod->mem[type].base = ptr;
+
+   return 0;
 }
 
-static void module_memory_free(void *ptr, enum mod_mem_type type)
+static void module_memory_free(struct module *mod, enum mod_mem_type type)
 {
+   void *ptr = mod->mem[type].base;
+
if (mod_mem_use_vmalloc(type))
vfree(ptr);
else
@@ -1229,12 +1258,12 @@ static void free_mod_mem(struct module *mod)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
-   module_memory_free(mod_mem->base, type);
+   module_memory_free(mod, type);
}
 
/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, 
mod->mem[MOD_DATA].size);
-   module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
+   module_memory_free(mod, MOD_DATA);
 }
 
 /* Free a module, remove from lists, etc. */
@@ -2225,7 +2254,6 @@ static int find_module_sections(struct module *mod, 
struct load_info *info)
 static int move_module(struct module *mod, struct load_info *info)
 {
int i;
-   void *ptr;
enum mod_mem_type t = 0;
int ret = -ENOMEM;
 
@@ -2234,26 +2262,12 @@ static int move_module(struct module *mod, struct 
load_info *info)
mod->mem[type].base = NULL;
continue;
}
-   mod->mem[type].size = PAGE_ALIGN(mod->mem[type].size);
-   ptr = module_memory_alloc(mod->mem[type].size, type);
-   /*
- * The pointer to these blocks of memory are stored on the 
module
- * structure and we keep that around so long as the module is
- * around. We only free that memory when we unload the module.
- * Just mark them as not being a leak then. The .init* ELF
- * sections *do* get freed after boot so we *could* treat them
- * slightly differently with kmemleak_ignore() and only grey
- * them out as they work as typical memory allocations which
- * *do* eventually get freed, but let's just keep things simple
- * and avoid *any* false positives.
-*/
-   kmemleak_not_leak(ptr);
-   if (!ptr) {
+
+   ret = module_memory_alloc(mod, type);
+   if (ret) {
t = type;
goto out_enomem;
}
-   memset(ptr, 0, mod->mem[type].size);
-   mod->mem[type].base = ptr;
}
 
/* Transfer each section which specifies SHF_ALLOC */
@@ -2296,7 +2310,7 @@ static int move_module(struct module *mod, struct 
load_info *info)
return 0;
 out_enomem:
for (t--; t >= 0; t--)
-   module_memory_free(mod->mem[t].base, t);
+   module_memory_free(mod, t);
return ret;
 }
 
-- 
2.43.0

[PATCH RESEND v8 04/16] sparc: simplify module_alloc()

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END
for 32-bit and reduce module_alloc() to

__vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, ...)

as with the new defines the allocations becomes identical for both 32
and 64 bits.

While on it, drop unused include of 

Suggested-by: Sam Ravnborg 
Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Sam Ravnborg 
---
 arch/sparc/include/asm/pgtable_32.h |  2 ++
 arch/sparc/kernel/module.c  | 25 +
 2 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_32.h 
b/arch/sparc/include/asm/pgtable_32.h
index 9e85d57ac3f2..62bcafe38b1f 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -432,6 +432,8 @@ static inline int io_remap_pfn_range(struct vm_area_struct 
*vma,
 
 #define VMALLOC_START   _AC(0xfe60,UL)
 #define VMALLOC_END _AC(0xffc0,UL)
+#define MODULES_VADDR   VMALLOC_START
+#define MODULES_END VMALLOC_END
 
 /* We provide our own get_unmapped_area to cope with VA holes for userland */
 #define HAVE_ARCH_UNMAPPED_AREA
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index 66c45a2764bc..d37adb2a0b54 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -21,35 +21,12 @@
 
 #include "entry.h"
 
-#ifdef CONFIG_SPARC64
-
-#include 
-
-static void *module_map(unsigned long size)
+void *module_alloc(unsigned long size)
 {
-   if (PAGE_ALIGN(size) > MODULES_LEN)
-   return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
-#else
-static void *module_map(unsigned long size)
-{
-   return vmalloc(size);
-}
-#endif /* CONFIG_SPARC64 */
-
-void *module_alloc(unsigned long size)
-{
-   void *ret;
-
-   ret = module_map(size);
-   if (ret)
-   memset(ret, 0, size);
-
-   return ret;
-}
 
 /* Make generic code ignore STT_REGISTER dummy undefined symbols.  */
 int module_frob_arch_sections(Elf_Ehdr *hdr,
-- 
2.43.0

[PATCH RESEND v8 03/16] nios2: define virtual address space for modules

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

nios2 uses kmalloc() to implement module_alloc() because CALL26/PCREL26
cannot reach all of vmalloc address space.

Define module space as 32MiB below the kernel base and switch nios2 to
use vmalloc for module allocations.

Suggested-by: Thomas Gleixner 
Acked-by: Dinh Nguyen 
Acked-by: Song Liu 
Signed-off-by: Mike Rapoport (IBM) 
---
 arch/nios2/include/asm/pgtable.h |  5 -
 arch/nios2/kernel/module.c   | 19 ---
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index d052dfcbe8d3..eab87c6beacb 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -25,7 +25,10 @@
 #include 
 
 #define VMALLOC_START  CONFIG_NIOS2_KERNEL_MMU_REGION_BASE
-#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
+#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M - 1)
+
+#define MODULES_VADDR  (CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M)
+#define MODULES_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
 
 struct mm_struct;
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 76e0a42d6e36..9c97b7513853 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -21,23 +21,12 @@
 
 #include 
 
-/*
- * Modules should NOT be allocated with kmalloc for (obvious) reasons.
- * But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
- * from 0x8000 (vmalloc area) to 0xc (kernel) (kmalloc returns
- * addresses in 0xc000)
- */
 void *module_alloc(unsigned long size)
 {
-   if (size == 0)
-   return NULL;
-   return kmalloc(size, GFP_KERNEL);
-}
-
-/* Free memory returned from module_alloc */
-void module_memfree(void *module_region)
-{
-   kfree(module_region);
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+   GFP_KERNEL, PAGE_KERNEL_EXEC,
+   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
-- 
2.43.0

[PATCH RESEND v8 02/16] mips: module: rename MODULE_START to MODULES_VADDR

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

and MODULE_END to MODULES_END to match other architectures that define
custom address space for modules.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/mips/include/asm/pgtable-64.h | 4 ++--
 arch/mips/kernel/module.c  | 4 ++--
 arch/mips/mm/fault.c   | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/pgtable-64.h 
b/arch/mips/include/asm/pgtable-64.h
index 20ca48c1b606..c0109aff223b 100644
--- a/arch/mips/include/asm/pgtable-64.h
+++ b/arch/mips/include/asm/pgtable-64.h
@@ -147,8 +147,8 @@
 #if defined(CONFIG_MODULES) && defined(KBUILD_64BIT_SYM32) && \
VMALLOC_START != CKSSEG
 /* Load modules into 32bit-compatible segment. */
-#define MODULE_START   CKSSEG
-#define MODULE_END (FIXADDR_START-2*PAGE_SIZE)
+#define MODULES_VADDR  CKSSEG
+#define MODULES_END(FIXADDR_START-2*PAGE_SIZE)
 #endif
 
 #define pte_ERROR(e) \
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 7b2fbaa9cac5..9a6c96014904 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -31,10 +31,10 @@ struct mips_hi16 {
 static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
-#ifdef MODULE_START
+#ifdef MODULES_VADDR
 void *module_alloc(unsigned long size)
 {
-   return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index aaa9a242ebba..37fedeaca2e9 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -83,8 +83,8 @@ static void __do_page_fault(struct pt_regs *regs, unsigned 
long write,
 
if (unlikely(address >= VMALLOC_START && address <= VMALLOC_END))
goto VMALLOC_FAULT_TARGET;
-#ifdef MODULE_START
-   if (unlikely(address >= MODULE_START && address < MODULE_END))
+#ifdef MODULES_VADDR
+   if (unlikely(address >= MODULES_VADDR && address < MODULES_END))
goto VMALLOC_FAULT_TARGET;
 #endif
 
-- 
2.43.0

[PATCH RESEND v8 01/16] arm64: module: remove unneeded call to kasan_alloc_module_shadow()

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Since commit f6f37d9320a1 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS
modes") KASAN_VMALLOC is always enabled when KASAN is on. This means
that allocations in module_alloc() will be tracked by KASAN protection
for vmalloc() and that kasan_alloc_module_shadow() will be always an
empty inline and there is no point in calling it.

Drop meaningless call to kasan_alloc_module_shadow() from
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm64/kernel/module.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 47e0be610bb6..e92da4da1b2a 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -141,11 +141,6 @@ void *module_alloc(unsigned long size)
__func__);
}
 
-   if (p && (kasan_alloc_module_shadow(p, size, GFP_KERNEL) < 0)) {
-   vfree(p);
-   return NULL;
-   }
-
/* Memory is intended to be executable, reset the pointer tag. */
return kasan_reset_tag(p);
 }
-- 
2.43.0

[PATCH RESEND v8 00/16] mm: jit/text allocator

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Hi,

The patches are also available in git:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v8

v8:
* fix intialization of default_execmem_info

v7: https://lore.kernel.org/all/20240429121620.1186447-1-r...@kernel.org
* define MODULE_{VADDR,END} for riscv32 to fix the build and avoid
  #ifdefs in a function body
* add Acks, thanks everybody

v6: https://lore.kernel.org/all/20240426082854.7355-1-r...@kernel.org
* restore patch "arm64: extend execmem_info for generated code
  allocations" that disappeared in v5 rebase
* update execmem initialization so that by default it will be
  initialized early while late initialization will be an opt-in

v5: https://lore.kernel.org/all/20240422094436.3625171-1-r...@kernel.org
* rebase on v6.9-rc4 to avoid a conflict in kprobes
* add copyrights to mm/execmem.c (Luis)
* fix spelling (Ingo)
* define MODULES_VADDDR for sparc (Sam)
* consistently initialize struct execmem_info (Peter)
* reduce #ifdefs in function bodies in kprobes (Masami) 

v4: https://lore.kernel.org/all/20240411160051.2093261-1-r...@kernel.org
* rebase on v6.9-rc2
* rename execmem_params to execmem_info and execmem_arch_params() to
  execmem_arch_setup()
* use single execmem_alloc() API instead of execmem_{text,data}_alloc() (Song)
* avoid extra copy of execmem parameters (Rick)
* run execmem_init() as core_initcall() except for the architectures that
  may allocated text really early (currently only x86) (Will)
* add acks for some of arm64 and riscv changes, thanks Will and Alexandre
* new commits:
  - drop call to kasan_alloc_module_shadow() on arm64 because it's not
needed anymore
  - rename MODULE_START to MODULES_VADDR on MIPS
  - use CONFIG_EXECMEM instead of CONFIG_MODULES on powerpc as per Christophe:
https://lore.kernel.org/all/79062fa3-3402-47b3-8920-9231ad05e...@csgroup.eu/

v3: https://lore.kernel.org/all/20230918072955.2507221-1-r...@kernel.org
* add type parameter to execmem allocation APIs
* remove BPF dependency on modules

v2: https://lore.kernel.org/all/20230616085038.4121892-1-r...@kernel.org
* Separate "module" and "others" allocations with execmem_text_alloc()
and jit_text_alloc()
* Drop ROX entailment on x86
* Add ack for nios2 changes, thanks Dinh Nguyen

v1: https://lore.kernel.org/all/20230601101257.530867-1-r...@kernel.org

= Cover letter from v1 (sligtly updated) =

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystmes
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

A centralized infrastructure for code allocation allows allocations of
executable memory as ROX, and future optimizations such as caching large
pages for better iTLB performance and providing sub-page allocations for
users that only need small jit code snippets.

Rick Edgecombe proposed perm_alloc extension to vmalloc [1] and Song Liu
proposed execmem_alloc [2], but both these approaches were targeting BPF
allocations and lacked the ground work to abstract executable allocations
and split them from the modules core.

Thomas Gleixner suggested to express module allocation restrictions and
requirements as struct mod_alloc_type_params [3] that would define ranges,
protections and other parameters for different types of allocations used by
modules and following that suggestion Song separated allocations of
different types in modules (commit ac3b43283923 ("module: replace
module_layout with module_memory")) and posted "Type aware module
allocator" set [4].

I liked the idea of parametrising code allocation requirements as a
structure, but I believe the original proposal and Song's module allocator
was too module centric, so I came up with these patches.

This set splits code allocation from modules by introducing execmem_alloc()
and and execmem_free(), APIs, replaces call sites of module_alloc() and
module_memfree() with the new APIs and implements core text and related
allocations in a central place.

Instead of architecture specific overrides for module_alloc(), the
architectures that require non-default behaviour for text allocation must
fill execmem_info structure and implement execmem_arch_setup() that returns
a pointer to that structure. If an architecture does not implement
execmem_arch_setup(), the defaults compatible with the current
modules::module_alloc() are used.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem APIs
take a type argument, that will be used to identify the calling subsystem
and to allow architectures to define par

Re: [PATCH v6 3/5] regulator: add regulators driver for Marvell 88PM886 PMIC

2024-05-05 Thread Mark Brown

On Sat, May 04, 2024 at 09:37:06PM +0200, Karel Balej wrote:

> +static const struct regulator_ops pm886_ldo_ops = {
> + .list_voltage = regulator_list_voltage_table,
> + .map_voltage = regulator_map_voltage_iterate,
> + .set_voltage_sel = regulator_set_voltage_sel_regmap,
> + .get_voltage_sel = regulator_get_voltage_sel_regmap,
> + .enable = regulator_enable_regmap,
> + .disable = regulator_disable_regmap,
> + .is_enabled = regulator_is_enabled_regmap,
> + .get_current_limit = pm886_regulator_get_ilim,

Do these regulators actually enforce this limit or is this just a spec
limit beyond which regulation may fail?  If it's just a spec limit I'd
not expect this operation to be provided, it's more for a hard limit
where the regulator will detect and act on issues.  I don't see an error
interrupt or anything and this would be an unusual feature for a LDO.

signature.asc
Description: PGP signature

[PATCH v8 17/17] fixup: convert remaining archs: defaults handling

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Signed-off-by: Mike Rapoport (IBM) 
---
 mm/execmem.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/execmem.c b/mm/execmem.c
index f6dc3fabc1ca..0c4b36bc6d10 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -118,7 +118,6 @@ static void __init __execmem_init(void)
info->ranges[EXECMEM_DEFAULT].end = VMALLOC_END;
info->ranges[EXECMEM_DEFAULT].pgprot = PAGE_KERNEL_EXEC;
info->ranges[EXECMEM_DEFAULT].alignment = 1;
-   return;
}
 
if (!execmem_validate(info))
-- 
2.43.0

[PATCH v8 16/17] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

BPF just-in-time compiler depended on CONFIG_MODULES because it used
module_alloc() to allocate memory for the generated code.

Since code allocations are now implemented with execmem, drop dependency of
CONFIG_BPF_JIT on CONFIG_MODULES and make it select CONFIG_EXECMEM.

Suggested-by: Björn Töpel 
Signed-off-by: Mike Rapoport (IBM) 
---
 kernel/bpf/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index bc25f5098a25..f999e4e0b344 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -43,7 +43,7 @@ config BPF_JIT
bool "Enable BPF Just In Time compiler"
depends on BPF
depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
-   depends on MODULES
+   select EXECMEM
help
  BPF programs are normally handled by a BPF interpreter. This option
  allows the kernel to generate native code when a program is loaded
-- 
2.43.0

[PATCH v8 15/17] kprobes: remove dependency on CONFIG_MODULES

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

kprobes depended on CONFIG_MODULES because it has to allocate memory for
code.

Since code allocations are now implemented with execmem, kprobes can be
enabled in non-modular kernels.

Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
dependency of CONFIG_KPROBES on CONFIG_MODULES.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Masami Hiramatsu (Google) 
---
 arch/Kconfig|  2 +-
 include/linux/module.h  |  9 ++
 kernel/kprobes.c| 55 +++--
 kernel/trace/trace_kprobe.c | 20 +-
 4 files changed, 63 insertions(+), 23 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 4fd0daa54e6c..caa459964f09 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -52,9 +52,9 @@ config GENERIC_ENTRY
 
 config KPROBES
bool "Kprobes"
-   depends on MODULES
depends on HAVE_KPROBES
select KALLSYMS
+   select EXECMEM
select TASKS_RCU if PREEMPTION
help
  Kprobes allows you to trap at almost any kernel address and
diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..ffa1c603163c 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -605,6 +605,11 @@ static inline bool module_is_live(struct module *mod)
return mod->state != MODULE_STATE_GOING;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+return mod->state == MODULE_STATE_COMING;
+}
+
 struct module *__module_text_address(unsigned long addr);
 struct module *__module_address(unsigned long addr);
 bool is_module_address(unsigned long addr);
@@ -857,6 +862,10 @@ void *dereference_module_function_descriptor(struct module 
*mod, void *ptr)
return ptr;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+   return false;
+}
 #endif /* CONFIG_MODULES */
 
 #ifdef CONFIG_SYSFS
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ddd7cdc16edf..ca2c6cbd42d2 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1588,7 +1588,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
}
 
/* Get module refcount and reject __init functions for loaded modules. 
*/
-   if (*probed_mod) {
+   if (IS_ENABLED(CONFIG_MODULES) && *probed_mod) {
/*
 * We must hold a refcount of the probed module while updating
 * its code to prohibit unexpected unloading.
@@ -1603,12 +1603,13 @@ static int check_kprobe_address_safe(struct kprobe *p,
 * kprobes in there.
 */
if (within_module_init((unsigned long)p->addr, *probed_mod) &&
-   (*probed_mod)->state != MODULE_STATE_COMING) {
+   !module_is_coming(*probed_mod)) {
module_put(*probed_mod);
*probed_mod = NULL;
ret = -ENOENT;
}
}
+
 out:
preempt_enable();
jump_label_unlock();
@@ -2488,24 +2489,6 @@ int kprobe_add_area_blacklist(unsigned long start, 
unsigned long end)
return 0;
 }
 
-/* Remove all symbols in given area from kprobe blacklist */
-static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
-{
-   struct kprobe_blacklist_entry *ent, *n;
-
-   list_for_each_entry_safe(ent, n, &kprobe_blacklist, list) {
-   if (ent->start_addr < start || ent->start_addr >= end)
-   continue;
-   list_del(&ent->list);
-   kfree(ent);
-   }
-}
-
-static void kprobe_remove_ksym_blacklist(unsigned long entry)
-{
-   kprobe_remove_area_blacklist(entry, entry + 1);
-}
-
 int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
   char *type, char *sym)
 {
@@ -2570,6 +2553,25 @@ static int __init populate_kprobe_blacklist(unsigned 
long *start,
return ret ? : arch_populate_kprobe_blacklist();
 }
 
+#ifdef CONFIG_MODULES
+/* Remove all symbols in given area from kprobe blacklist */
+static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
+{
+   struct kprobe_blacklist_entry *ent, *n;
+
+   list_for_each_entry_safe(ent, n, &kprobe_blacklist, list) {
+   if (ent->start_addr < start || ent->start_addr >= end)
+   continue;
+   list_del(&ent->list);
+   kfree(ent);
+   }
+}
+
+static void kprobe_remove_ksym_blacklist(unsigned long entry)
+{
+   kprobe_remove_area_blacklist(entry, entry + 1);
+}
+
 static void add_module_kprobe_blacklist(struct module *mod)
 {
unsigned long start, end;
@@ -2672,6 +2674,17 @@ static struct notifier_block kprobe_module_nb = {
.priority = 0
 };
 
+static int kprobe_register_module_notifier(void)
+{
+   return register_module_notifier(&kprobe_module_nb);
+}
+#el

[PATCH v8 14/17] powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where appropriate

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

There are places where CONFIG_MODULES guards the code that depends on
memory allocation being done with module_alloc().

Replace CONFIG_MODULES with CONFIG_EXECMEM in such places.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/include/asm/kasan.h | 2 +-
 arch/powerpc/kernel/head_8xx.S   | 4 ++--
 arch/powerpc/kernel/head_book3s_32.S | 6 +++---
 arch/powerpc/lib/code-patching.c | 2 +-
 arch/powerpc/mm/book3s32/mmu.c   | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..2e586733a464 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -285,7 +285,7 @@ config PPC
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
-   select KASAN_VMALLOCif KASAN && MODULES
+   select KASAN_VMALLOCif KASAN && EXECMEM
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_PAGE_SIZE
select MMU_GATHER_RCU_TABLE_FREE
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 365d2720097c..b5bbb94c51f6 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -19,7 +19,7 @@
 
 #define KASAN_SHADOW_SCALE_SHIFT   3
 
-#if defined(CONFIG_MODULES) && defined(CONFIG_PPC32)
+#if defined(CONFIG_EXECMEM) && defined(CONFIG_PPC32)
 #define KASAN_KERN_START   ALIGN_DOWN(PAGE_OFFSET - SZ_256M, SZ_256M)
 #else
 #define KASAN_KERN_START   PAGE_OFFSET
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 647b0b445e89..edc479a7c2bc 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -199,12 +199,12 @@ instruction_counter:
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r10, r11)
mtspr   SPRN_MD_EPN, r10
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
mfcrr11
compare_to_kernel_boundary r10, r10
 #endif
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
blt+3f
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index c1d89764dd22..57196883a00e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -419,14 +419,14 @@ InstructionTLBMiss:
  */
/* Get PTE (linux-style) and check access */
mfspr   r3,SPRN_IMISS
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
 #endif
mfspr   r2, SPRN_SDR1
li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
rlwinm  r2, r2, 28, 0xf000
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
li  r0, 3
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
@@ -442,7 +442,7 @@ InstructionTLBMiss:
andc.   r1,r1,r2/* check access & ~permission */
bne-InstructionAddressInvalid /* return if access not permitted */
/* Convert linux-style PTE to low word of PPC-style PTE */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
rlwimi  r2, r0, 0, 31, 31   /* userspace ? -> PP lsb */
 #endif
ori r1, r1, 0xe06   /* clear out reserved bits */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6ab46156cda..7af791446ddf 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -225,7 +225,7 @@ void __init poking_init(void)
 
 static unsigned long get_patch_pfn(void *addr)
 {
-   if (IS_ENABLED(CONFIG_MODULES) && is_vmalloc_or_module_addr(addr))
+   if (IS_ENABLED(CONFIG_EXECMEM) && is_vmalloc_or_module_addr(addr))
return vmalloc_to_pfn(addr);
else
return __pa_symbol(addr) >> PAGE_SHIFT;
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 100f999871bc..625fe7d08e06 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -184,7 +184,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
 
 static bool is_module_segment(unsigned long addr)
 {
-   if (!IS_ENABLED(CONFIG_MODULES))
+   if (!IS_ENABLED(CONFIG_EXECMEM))
return false;
if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M))
return false;
-- 
2.43.0

[PATCH v8 13/17] x86/ftrace: enable dynamic ftrace without CONFIG_MODULES

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Dynamic ftrace must allocate memory for code and this was impossible
without CONFIG_MODULES.

With execmem separated from the modules code, execmem_text_alloc() is
available regardless of CONFIG_MODULES.

Remove dependency of dynamic ftrace on CONFIG_MODULES and make
CONFIG_DYNAMIC_FTRACE select CONFIG_EXECMEM in Kconfig.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/x86/Kconfig |  1 +
 arch/x86/kernel/ftrace.c | 10 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4474bf32d0a4..f2917ccf4fb4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -34,6 +34,7 @@ config X86_64
select SWIOTLB
select ARCH_HAS_ELFCORE_COMPAT
select ZONE_DMA32
+   select EXECMEM if DYNAMIC_FTRACE
 
 config FORCE_DYNAMIC_FTRACE
def_bool y
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index c8ddb7abda7c..8da0e66ca22d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -261,8 +261,6 @@ void arch_ftrace_update_code(int command)
 /* Currently only x86_64 supports dynamic trampolines */
 #ifdef CONFIG_X86_64
 
-#ifdef CONFIG_MODULES
-/* Module allocation simplifies allocating memory for code */
 static inline void *alloc_tramp(unsigned long size)
 {
return execmem_alloc(EXECMEM_FTRACE, size);
@@ -271,14 +269,6 @@ static inline void tramp_free(void *tramp)
 {
execmem_free(tramp);
 }
-#else
-/* Trampolines can only be created if modules are supported */
-static inline void *alloc_tramp(unsigned long size)
-{
-   return NULL;
-}
-static inline void tramp_free(void *tramp) { }
-#endif
 
 /* Defined as markers to the end of the ftrace default trampolines */
 extern void ftrace_regs_caller_end(void);
-- 
2.43.0

[PATCH v8 12/17] arch: make execmem setup available regardless of CONFIG_MODULES

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

execmem does not depend on modules, on the contrary modules use
execmem.

To make execmem available when CONFIG_MODULES=n, for instance for
kprobes, split execmem_params initialization out from
arch/*/kernel/module.c and compile it when CONFIG_EXECMEM=y

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Philippe Mathieu-Daudé 
---
 arch/arm/kernel/module.c   |  43 --
 arch/arm/mm/init.c |  45 +++
 arch/arm64/kernel/module.c | 140 -
 arch/arm64/mm/init.c   | 140 +
 arch/loongarch/kernel/module.c |  19 -
 arch/loongarch/mm/init.c   |  21 +
 arch/mips/kernel/module.c  |  22 --
 arch/mips/mm/init.c|  23 ++
 arch/nios2/kernel/module.c |  20 -
 arch/nios2/mm/init.c   |  21 +
 arch/parisc/kernel/module.c|  20 -
 arch/parisc/mm/init.c  |  23 +-
 arch/powerpc/kernel/module.c   |  63 ---
 arch/powerpc/mm/mem.c  |  64 +++
 arch/riscv/kernel/module.c |  34 
 arch/riscv/mm/init.c   |  35 +
 arch/s390/kernel/module.c  |  27 ---
 arch/s390/mm/init.c|  30 +++
 arch/sparc/kernel/module.c |  19 -
 arch/sparc/mm/Makefile |   2 +
 arch/sparc/mm/execmem.c|  21 +
 arch/x86/kernel/module.c   |  27 ---
 arch/x86/mm/init.c |  29 +++
 23 files changed, 453 insertions(+), 435 deletions(-)
 create mode 100644 arch/sparc/mm/execmem.c

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index a98fdf6ff26c..677f218f7e84 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -12,57 +12,14 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
-#include 
 
 #include 
 #include 
 #include 
 #include 
 
-#ifdef CONFIG_XIP_KERNEL
-/*
- * The XIP kernel text is mapped in the module area for modules and
- * some other stuff to work without any indirect relocations.
- * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
- * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
- */
-#undef MODULES_VADDR
-#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
-#endif
-
-#ifdef CONFIG_MMU
-static struct execmem_info execmem_info __ro_after_init;
-
-struct execmem_info __init *execmem_arch_setup(void)
-{
-   unsigned long fallback_start = 0, fallback_end = 0;
-
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
-   fallback_start = VMALLOC_START;
-   fallback_end = VMALLOC_END;
-   }
-
-   execmem_info = (struct execmem_info){
-   .ranges = {
-   [EXECMEM_DEFAULT] = {
-   .start  = MODULES_VADDR,
-   .end= MODULES_END,
-   .pgprot = PAGE_KERNEL_EXEC,
-   .alignment = 1,
-   .fallback_start = fallback_start,
-   .fallback_end   = fallback_end,
-   },
-   },
-   };
-
-   return &execmem_info;
-}
-#endif
-
 bool module_init_section(const char *name)
 {
return strstarts(name, ".init") ||
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index e8c6f4be0ce1..5345d218899a 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -486,3 +487,47 @@ void free_initrd_mem(unsigned long start, unsigned long 
end)
free_reserved_area((void *)start, (void *)end, -1, "initrd");
 }
 #endif
+
+#ifdef CONFIG_EXECMEM
+
+#ifdef CONFIG_XIP_KERNEL
+/*
+ * The XIP kernel text is mapped in the module area for modules and
+ * some other stuff to work without any indirect relocations.
+ * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
+ * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
+ */
+#undef MODULES_VADDR
+#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
+#endif
+
+#ifdef CONFIG_MMU
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
+{
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+

[PATCH v8 11/17] powerpc: extend execmem_params for kprobes allocations

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

powerpc overrides kprobes::alloc_insn_page() to remove writable
permissions when STRICT_MODULE_RWX is on.

Add definition of EXECMEM_KRPOBES to execmem_params to allow using the
generic kprobes::alloc_insn_page() with the desired permissions.

As powerpc uses breakpoint instructions to inject kprobes, it does not
need to constrain kprobe allocations to the modules area and can use the
entire vmalloc address space.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/kernel/kprobes.c | 20 
 arch/powerpc/kernel/module.c  |  7 +++
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9fcd01bb2ce6..14c5ddec3056 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -126,26 +126,6 @@ kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long 
addr, unsigned long offse
return (kprobe_opcode_t *)(addr + offset);
 }
 
-void *alloc_insn_page(void)
-{
-   void *page;
-
-   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
-   if (!page)
-   return NULL;
-
-   if (strict_module_rwx_enabled()) {
-   int err = set_memory_rox((unsigned long)page, 1);
-
-   if (err)
-   goto error;
-   }
-   return page;
-error:
-   execmem_free(page);
-   return NULL;
-}
-
 int arch_prepare_kprobe(struct kprobe *p)
 {
int ret = 0;
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index ac80559015a3..2a23cf7e141b 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -94,6 +94,7 @@ static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
 {
+   pgprot_t kprobes_prot = strict_module_rwx_enabled() ? PAGE_KERNEL_ROX : 
PAGE_KERNEL_EXEC;
pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : 
PAGE_KERNEL_EXEC;
unsigned long fallback_start = 0, fallback_end = 0;
unsigned long start, end;
@@ -132,6 +133,12 @@ struct execmem_info __init *execmem_arch_setup(void)
.fallback_start = fallback_start,
.fallback_end   = fallback_end,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = kprobes_prot,
+   .alignment = 1,
+   },
[EXECMEM_MODULE_DATA] = {
.start  = VMALLOC_START,
.end= VMALLOC_END,
-- 
2.43.0

[PATCH v8 10/17] arm64: extend execmem_info for generated code allocations

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

The memory allocations for kprobes and BPF on arm64 can be placed
anywhere in vmalloc address space and currently this is implemented with
overrides of alloc_insn_page() and bpf_jit_alloc_exec() in arm64.

Define EXECMEM_KPROBES and EXECMEM_BPF ranges in arm64::execmem_info and
drop overrides of alloc_insn_page() and bpf_jit_alloc_exec().

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Will Deacon 
---
 arch/arm64/kernel/module.c | 12 
 arch/arm64/kernel/probes/kprobes.c |  7 ---
 arch/arm64/net/bpf_jit_comp.c  | 11 ---
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index b7a7a23f9f8f..a52240ea084b 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -146,6 +146,18 @@ struct execmem_info __init *execmem_arch_setup(void)
.fallback_start = fallback_start,
.fallback_end   = fallback_end,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL_ROX,
+   .alignment = 1,
+   },
+   [EXECMEM_BPF] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
},
};
 
diff --git a/arch/arm64/kernel/probes/kprobes.c 
b/arch/arm64/kernel/probes/kprobes.c
index 327855a11df2..4268678d0e86 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -129,13 +129,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
return 0;
 }
 
-void *alloc_insn_page(void)
-{
-   return __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS,
-   NUMA_NO_NODE, __builtin_return_address(0));
-}
-
 /* arm kprobe: install breakpoint in text */
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 122021f9bdfc..456f5af239fc 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1793,17 +1793,6 @@ u64 bpf_jit_alloc_exec_limit(void)
return VMALLOC_END - VMALLOC_START;
 }
 
-void *bpf_jit_alloc_exec(unsigned long size)
-{
-   /* Memory is intended to be executable, reset the pointer tag. */
-   return kasan_reset_tag(vmalloc(size));
-}
-
-void bpf_jit_free_exec(void *addr)
-{
-   return vfree(addr);
-}
-
 /* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool bpf_jit_supports_subprog_tailcalls(void)
 {
-- 
2.43.0

[PATCH v8 09/17] riscv: extend execmem_params for generated code allocations

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

The memory allocations for kprobes and BPF on RISC-V are not placed in
the modules area and these custom allocations are implemented with
overrides of alloc_insn_page() and  bpf_jit_alloc_exec().

Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END for
32 bit and slightly reorder execmem_params initialization to support both
32 and 64 bit variants, define EXECMEM_KPROBES and EXECMEM_BPF ranges in
riscv::execmem_params and drop overrides of alloc_insn_page() and
bpf_jit_alloc_exec().

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Alexandre Ghiti 
---
 arch/riscv/include/asm/pgtable.h   |  3 +++
 arch/riscv/kernel/module.c | 14 +-
 arch/riscv/kernel/probes/kprobes.c | 10 --
 arch/riscv/net/bpf_jit_core.c  | 13 -
 4 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 9f8ea0e33eb1..5f21814e438e 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -55,6 +55,9 @@
 #define MODULES_LOWEST_VADDR   (KERNEL_LINK_ADDR - SZ_2G)
 #define MODULES_VADDR  (PFN_ALIGN((unsigned long)&_end) - SZ_2G)
 #define MODULES_END(PFN_ALIGN((unsigned long)&_start))
+#else
+#define MODULES_VADDR  VMALLOC_START
+#define MODULES_ENDVMALLOC_END
 #endif
 
 /*
diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
index 182904127ba0..0e6415f00fca 100644
--- a/arch/riscv/kernel/module.c
+++ b/arch/riscv/kernel/module.c
@@ -906,7 +906,7 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
+#ifdef CONFIG_MMU
 static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
@@ -919,6 +919,18 @@ struct execmem_info __init *execmem_arch_setup(void)
.pgprot = PAGE_KERNEL,
.alignment = 1,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL_READ_EXEC,
+   .alignment = 1,
+   },
+   [EXECMEM_BPF] = {
+   .start  = BPF_JIT_REGION_START,
+   .end= BPF_JIT_REGION_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = PAGE_SIZE,
+   },
},
};
 
diff --git a/arch/riscv/kernel/probes/kprobes.c 
b/arch/riscv/kernel/probes/kprobes.c
index 2f08c14a933d..e64f2f3064eb 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -104,16 +104,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
return 0;
 }
 
-#ifdef CONFIG_MMU
-void *alloc_insn_page(void)
-{
-   return  __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
-GFP_KERNEL, PAGE_KERNEL_READ_EXEC,
-VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-__builtin_return_address(0));
-}
-#endif
-
 /* install breakpoint in text */
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index 6b3acac30c06..e238fdbd5dbc 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -219,19 +219,6 @@ u64 bpf_jit_alloc_exec_limit(void)
return BPF_JIT_REGION_SIZE;
 }
 
-void *bpf_jit_alloc_exec(unsigned long size)
-{
-   return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START,
-   BPF_JIT_REGION_END, GFP_KERNEL,
-   PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-}
-
-void bpf_jit_free_exec(void *addr)
-{
-   return vfree(addr);
-}
-
 void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
int ret;
-- 
2.43.0

[PATCH v8 08/17] mm/execmem, arch: convert remaining overrides of module_alloc to execmem

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Extend execmem parameters to accommodate more complex overrides of
module_alloc() by architectures.

This includes specification of a fallback range required by arm, arm64
and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for
allocation of KASAN shadow required by s390 and x86 and support for
late initialization of execmem required by arm64.

The core implementation of execmem_alloc() takes care of suppressing
warnings when the initial allocation fails but there is a fallback range
defined.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Will Deacon 
Acked-by: Song Liu 
Tested-by: Liviu Dudau 
---
 arch/Kconfig |  8 
 arch/arm/kernel/module.c | 41 
 arch/arm64/Kconfig   |  1 +
 arch/arm64/kernel/module.c   | 55 ++
 arch/powerpc/kernel/module.c | 60 +++--
 arch/s390/kernel/module.c| 54 +++---
 arch/x86/kernel/module.c | 70 +++--
 include/linux/execmem.h  | 30 ++-
 include/linux/moduleloader.h | 12 --
 kernel/module/main.c | 26 +++--
 mm/execmem.c | 75 ++--
 11 files changed, 247 insertions(+), 185 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 65afb1de48b3..4fd0daa54e6c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -960,6 +960,14 @@ config ARCH_WANTS_MODULES_DATA_IN_VMALLOC
  For architectures like powerpc/32 which have constraints on module
  allocation and need to allocate module data outside of module area.
 
+config ARCH_WANTS_EXECMEM_LATE
+   bool
+   help
+ For architectures that do not allocate executable memory early on
+ boot, but rather require its initialization late when there is
+ enough entropy for module space randomization, for instance
+ arm64.
+
 config HAVE_IRQ_EXIT_ON_IRQ_STACK
bool
help
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index e74d84f58b77..a98fdf6ff26c 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -34,23 +35,31 @@
 #endif
 
 #ifdef CONFIG_MMU
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   gfp_t gfp_mask = GFP_KERNEL;
-   void *p;
-
-   /* Silence the initial allocation */
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
-   gfp_mask |= __GFP_NOWARN;
-
-   p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-   if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
-   return p;
-   return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+   },
+   },
+   };
+
+   return &execmem_info;
 }
 #endif
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..74b34a78b7ac 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -105,6 +105,7 @@ config ARM64
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES 
&& !ARM64_VA_BITS_36)
select ARCH_WANT_LD_ORPHAN_WARN
+   select ARCH_WANTS_EXECMEM_LATE if EXECMEM
select ARCH_WANTS_NO_INSTR
select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
select ARCH_HAS_UBSAN
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index e92da4da1b2a..b7a7a23f9f8f 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -108,41 +109,47 @@ static int __init module_init_limits(void)
 
return 0;
 }
-subsys_initcall(module_init_limits);
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *exec

[PATCH v8 07/17] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Several architectures override module_alloc() only to define address
range for code allocations different than VMALLOC address space.

Provide a generic implementation in execmem that uses the parameters for
address space ranges, required alignment and page protections provided
by architectures.

The architectures must fill execmem_info structure and implement
execmem_arch_setup() that returns a pointer to that structure. This way the
execmem initialization won't be called from every architecture, but rather
from a central place, namely a core_initcall() in execmem.

The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
with the parameters defined by the architectures.  If an architecture does
not implement execmem_arch_setup(), execmem_alloc() will fall back to
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Song Liu 
---
 arch/loongarch/kernel/module.c | 19 --
 arch/mips/kernel/module.c  | 20 --
 arch/nios2/kernel/module.c | 21 ---
 arch/parisc/kernel/module.c| 24 
 arch/riscv/kernel/module.c | 24 
 arch/sparc/kernel/module.c | 20 --
 include/linux/execmem.h| 47 
 mm/execmem.c   | 67 --
 mm/mm_init.c   |  2 +
 9 files changed, 210 insertions(+), 34 deletions(-)

diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
index c7d0338d12c1..ca6dd7ea1610 100644
--- a/arch/loongarch/kernel/module.c
+++ b/arch/loongarch/kernel/module.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -490,10 +491,22 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, 
__builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return &execmem_info;
 }
 
 static void module_init_ftrace_plt(const Elf_Ehdr *hdr,
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 9a6c96014904..59225a3cf918 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct mips_hi16 {
@@ -32,11 +33,22 @@ static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
 #ifdef MODULES_VADDR
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return &execmem_info;
 }
 #endif
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 9c97b7513853..0d1ee86631fc 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -18,15 +18,26 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC,
-   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return &execmem_info;
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
diff --g

[PATCH v8 06/17] mm: introduce execmem_alloc() and execmem_free()

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystems
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

Start splitting code allocation from modules by introducing execmem_alloc()
and execmem_free() APIs.

Initially, execmem_alloc() is a wrapper for module_alloc() and
execmem_free() is a replacement of module_memfree() to allow updating all
call sites to use the new APIs.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem_alloc() takes
a type argument, that will be used to identify the calling subsystem and to
allow architectures define parameters for ranges suitable for that
subsystem.

No functional changes.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Masami Hiramatsu (Google) 
Acked-by: Song Liu 
---
 arch/powerpc/kernel/kprobes.c|  6 ++--
 arch/s390/kernel/ftrace.c|  4 +--
 arch/s390/kernel/kprobes.c   |  4 +--
 arch/s390/kernel/module.c|  5 +--
 arch/sparc/net/bpf_jit_comp_32.c |  8 ++---
 arch/x86/kernel/ftrace.c |  6 ++--
 arch/x86/kernel/kprobes/core.c   |  4 +--
 include/linux/execmem.h  | 57 
 include/linux/moduleloader.h |  3 --
 kernel/bpf/core.c|  6 ++--
 kernel/kprobes.c |  8 ++---
 kernel/module/Kconfig|  1 +
 kernel/module/main.c | 25 +-
 mm/Kconfig   |  3 ++
 mm/Makefile  |  1 +
 mm/execmem.c | 32 ++
 16 files changed, 128 insertions(+), 45 deletions(-)
 create mode 100644 include/linux/execmem.h
 create mode 100644 mm/execmem.c

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index bbca90a5e2ec..9fcd01bb2ce6 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -19,8 +19,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -130,7 +130,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
 
@@ -142,7 +142,7 @@ void *alloc_insn_page(void)
}
return page;
 error:
-   module_memfree(page);
+   execmem_free(page);
return NULL;
 }
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..798249ef5646 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -7,13 +7,13 @@
  *   Author(s): Martin Schwidefsky 
  */
 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -220,7 +220,7 @@ static int __init ftrace_plt_init(void)
 {
const char *start, *end;
 
-   ftrace_plt = module_alloc(PAGE_SIZE);
+   ftrace_plt = execmem_alloc(EXECMEM_FTRACE, PAGE_SIZE);
if (!ftrace_plt)
panic("cannot allocate ftrace plt\n");
 
diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index f0cf20d4b3c5..3c1b1be744de 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -9,7 +9,6 @@
 
 #define pr_fmt(fmt) "kprobes: " fmt
 
-#include 
 #include 
 #include 
 #include 
@@ -21,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -38,7 +38,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
set_memory_rox((unsigned long)page, 1);
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 42215f9404af..ac97a905e8cd 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,7 +77,7 @@ void *module_alloc(unsigned long size)
 #ifdef CONFIG_FUNCTION_TRACER
 void module_arch_cleanup(struct module *mod)
 {
-   module_memfree(mod->arch.trampolines_start);
+   execmem_free(mod->arch.trampolines_start);
 }
 #endif
 
@@ -510,7 +511,7 @@ static int module_alloc_ftrace_hotpatch_trampolines(struct 
module *me,
 
size = FTRACE_HOTPATCH_TRAMPOLINES_SIZE(s->sh_size);
numpages = DIV_ROUND_UP(size, PAGE_SIZE);
-   start = module_alloc(numpages * PAGE_SIZE);
+   start = execmem_alloc(EXECMEM_FTRACE, numpages * PAGE_SIZE);
if (!start)

[PATCH v8 05/17] module: make module_memory_{alloc,free} more self-contained

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Move the logic related to the memory allocation and freeing into
module_memory_alloc() and module_memory_free().

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Philippe Mathieu-Daudé 
---
 kernel/module/main.c | 64 +++-
 1 file changed, 39 insertions(+), 25 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index e1e8a7a9d6c1..5b82b069e0d3 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1203,15 +1203,44 @@ static bool mod_mem_use_vmalloc(enum mod_mem_type type)
mod_mem_type_is_core_data(type);
 }
 
-static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
+static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
 {
+   unsigned int size = PAGE_ALIGN(mod->mem[type].size);
+   void *ptr;
+
+   mod->mem[type].size = size;
+
if (mod_mem_use_vmalloc(type))
-   return vzalloc(size);
-   return module_alloc(size);
+   ptr = vmalloc(size);
+   else
+   ptr = module_alloc(size);
+
+   if (!ptr)
+   return -ENOMEM;
+
+   /*
+* The pointer to these blocks of memory are stored on the module
+* structure and we keep that around so long as the module is
+* around. We only free that memory when we unload the module.
+* Just mark them as not being a leak then. The .init* ELF
+* sections *do* get freed after boot so we *could* treat them
+* slightly differently with kmemleak_ignore() and only grey
+* them out as they work as typical memory allocations which
+* *do* eventually get freed, but let's just keep things simple
+* and avoid *any* false positives.
+*/
+   kmemleak_not_leak(ptr);
+
+   memset(ptr, 0, size);
+   mod->mem[type].base = ptr;
+
+   return 0;
 }
 
-static void module_memory_free(void *ptr, enum mod_mem_type type)
+static void module_memory_free(struct module *mod, enum mod_mem_type type)
 {
+   void *ptr = mod->mem[type].base;
+
if (mod_mem_use_vmalloc(type))
vfree(ptr);
else
@@ -1229,12 +1258,12 @@ static void free_mod_mem(struct module *mod)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
-   module_memory_free(mod_mem->base, type);
+   module_memory_free(mod, type);
}
 
/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, 
mod->mem[MOD_DATA].size);
-   module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
+   module_memory_free(mod, MOD_DATA);
 }
 
 /* Free a module, remove from lists, etc. */
@@ -2225,7 +2254,6 @@ static int find_module_sections(struct module *mod, 
struct load_info *info)
 static int move_module(struct module *mod, struct load_info *info)
 {
int i;
-   void *ptr;
enum mod_mem_type t = 0;
int ret = -ENOMEM;
 
@@ -2234,26 +2262,12 @@ static int move_module(struct module *mod, struct 
load_info *info)
mod->mem[type].base = NULL;
continue;
}
-   mod->mem[type].size = PAGE_ALIGN(mod->mem[type].size);
-   ptr = module_memory_alloc(mod->mem[type].size, type);
-   /*
- * The pointer to these blocks of memory are stored on the 
module
- * structure and we keep that around so long as the module is
- * around. We only free that memory when we unload the module.
- * Just mark them as not being a leak then. The .init* ELF
- * sections *do* get freed after boot so we *could* treat them
- * slightly differently with kmemleak_ignore() and only grey
- * them out as they work as typical memory allocations which
- * *do* eventually get freed, but let's just keep things simple
- * and avoid *any* false positives.
-*/
-   kmemleak_not_leak(ptr);
-   if (!ptr) {
+
+   ret = module_memory_alloc(mod, type);
+   if (ret) {
t = type;
goto out_enomem;
}
-   memset(ptr, 0, mod->mem[type].size);
-   mod->mem[type].base = ptr;
}
 
/* Transfer each section which specifies SHF_ALLOC */
@@ -2296,7 +2310,7 @@ static int move_module(struct module *mod, struct 
load_info *info)
return 0;
 out_enomem:
for (t--; t >= 0; t--)
-   module_memory_free(mod->mem[t].base, t);
+   module_memory_free(mod, t);
return ret;
 }
 
-- 
2.43.0

[PATCH v8 04/17] sparc: simplify module_alloc()

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END
for 32-bit and reduce module_alloc() to

__vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, ...)

as with the new defines the allocations becomes identical for both 32
and 64 bits.

While on it, drop unused include of 

Suggested-by: Sam Ravnborg 
Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Sam Ravnborg 
---
 arch/sparc/include/asm/pgtable_32.h |  2 ++
 arch/sparc/kernel/module.c  | 25 +
 2 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_32.h 
b/arch/sparc/include/asm/pgtable_32.h
index 9e85d57ac3f2..62bcafe38b1f 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -432,6 +432,8 @@ static inline int io_remap_pfn_range(struct vm_area_struct 
*vma,
 
 #define VMALLOC_START   _AC(0xfe60,UL)
 #define VMALLOC_END _AC(0xffc0,UL)
+#define MODULES_VADDR   VMALLOC_START
+#define MODULES_END VMALLOC_END
 
 /* We provide our own get_unmapped_area to cope with VA holes for userland */
 #define HAVE_ARCH_UNMAPPED_AREA
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index 66c45a2764bc..d37adb2a0b54 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -21,35 +21,12 @@
 
 #include "entry.h"
 
-#ifdef CONFIG_SPARC64
-
-#include 
-
-static void *module_map(unsigned long size)
+void *module_alloc(unsigned long size)
 {
-   if (PAGE_ALIGN(size) > MODULES_LEN)
-   return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
-#else
-static void *module_map(unsigned long size)
-{
-   return vmalloc(size);
-}
-#endif /* CONFIG_SPARC64 */
-
-void *module_alloc(unsigned long size)
-{
-   void *ret;
-
-   ret = module_map(size);
-   if (ret)
-   memset(ret, 0, size);
-
-   return ret;
-}
 
 /* Make generic code ignore STT_REGISTER dummy undefined symbols.  */
 int module_frob_arch_sections(Elf_Ehdr *hdr,
-- 
2.43.0

[PATCH v8 03/17] nios2: define virtual address space for modules

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

nios2 uses kmalloc() to implement module_alloc() because CALL26/PCREL26
cannot reach all of vmalloc address space.

Define module space as 32MiB below the kernel base and switch nios2 to
use vmalloc for module allocations.

Suggested-by: Thomas Gleixner 
Acked-by: Dinh Nguyen 
Acked-by: Song Liu 
Signed-off-by: Mike Rapoport (IBM) 
---
 arch/nios2/include/asm/pgtable.h |  5 -
 arch/nios2/kernel/module.c   | 19 ---
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index d052dfcbe8d3..eab87c6beacb 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -25,7 +25,10 @@
 #include 
 
 #define VMALLOC_START  CONFIG_NIOS2_KERNEL_MMU_REGION_BASE
-#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
+#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M - 1)
+
+#define MODULES_VADDR  (CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M)
+#define MODULES_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
 
 struct mm_struct;
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 76e0a42d6e36..9c97b7513853 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -21,23 +21,12 @@
 
 #include 
 
-/*
- * Modules should NOT be allocated with kmalloc for (obvious) reasons.
- * But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
- * from 0x8000 (vmalloc area) to 0xc (kernel) (kmalloc returns
- * addresses in 0xc000)
- */
 void *module_alloc(unsigned long size)
 {
-   if (size == 0)
-   return NULL;
-   return kmalloc(size, GFP_KERNEL);
-}
-
-/* Free memory returned from module_alloc */
-void module_memfree(void *module_region)
-{
-   kfree(module_region);
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+   GFP_KERNEL, PAGE_KERNEL_EXEC,
+   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
-- 
2.43.0

[PATCH v8 02/17] mips: module: rename MODULE_START to MODULES_VADDR

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

and MODULE_END to MODULES_END to match other architectures that define
custom address space for modules.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/mips/include/asm/pgtable-64.h | 4 ++--
 arch/mips/kernel/module.c  | 4 ++--
 arch/mips/mm/fault.c   | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/pgtable-64.h 
b/arch/mips/include/asm/pgtable-64.h
index 20ca48c1b606..c0109aff223b 100644
--- a/arch/mips/include/asm/pgtable-64.h
+++ b/arch/mips/include/asm/pgtable-64.h
@@ -147,8 +147,8 @@
 #if defined(CONFIG_MODULES) && defined(KBUILD_64BIT_SYM32) && \
VMALLOC_START != CKSSEG
 /* Load modules into 32bit-compatible segment. */
-#define MODULE_START   CKSSEG
-#define MODULE_END (FIXADDR_START-2*PAGE_SIZE)
+#define MODULES_VADDR  CKSSEG
+#define MODULES_END(FIXADDR_START-2*PAGE_SIZE)
 #endif
 
 #define pte_ERROR(e) \
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 7b2fbaa9cac5..9a6c96014904 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -31,10 +31,10 @@ struct mips_hi16 {
 static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
-#ifdef MODULE_START
+#ifdef MODULES_VADDR
 void *module_alloc(unsigned long size)
 {
-   return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index aaa9a242ebba..37fedeaca2e9 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -83,8 +83,8 @@ static void __do_page_fault(struct pt_regs *regs, unsigned 
long write,
 
if (unlikely(address >= VMALLOC_START && address <= VMALLOC_END))
goto VMALLOC_FAULT_TARGET;
-#ifdef MODULE_START
-   if (unlikely(address >= MODULE_START && address < MODULE_END))
+#ifdef MODULES_VADDR
+   if (unlikely(address >= MODULES_VADDR && address < MODULES_END))
goto VMALLOC_FAULT_TARGET;
 #endif
 
-- 
2.43.0

[PATCH v8 01/17] arm64: module: remove unneeded call to kasan_alloc_module_shadow()

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Since commit f6f37d9320a1 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS
modes") KASAN_VMALLOC is always enabled when KASAN is on. This means
that allocations in module_alloc() will be tracked by KASAN protection
for vmalloc() and that kasan_alloc_module_shadow() will be always an
empty inline and there is no point in calling it.

Drop meaningless call to kasan_alloc_module_shadow() from
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm64/kernel/module.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 47e0be610bb6..e92da4da1b2a 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -141,11 +141,6 @@ void *module_alloc(unsigned long size)
__func__);
}
 
-   if (p && (kasan_alloc_module_shadow(p, size, GFP_KERNEL) < 0)) {
-   vfree(p);
-   return NULL;
-   }
-
/* Memory is intended to be executable, reset the pointer tag. */
return kasan_reset_tag(p);
 }
-- 
2.43.0

[PATCH v8 00/17] mm: jit/text allocator

2024-05-05 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Hi,

The patches are also available in git:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v8

v8:
* fix intialization of default_execmem_info

v7: https://lore.kernel.org/all/20240429121620.1186447-1-r...@kernel.org
* define MODULE_{VADDR,END} for riscv32 to fix the build and avoid
  #ifdefs in a function body
* add Acks, thanks everybody

v6: https://lore.kernel.org/all/20240426082854.7355-1-r...@kernel.org
* restore patch "arm64: extend execmem_info for generated code
  allocations" that disappeared in v5 rebase
* update execmem initialization so that by default it will be
  initialized early while late initialization will be an opt-in

v5: https://lore.kernel.org/all/20240422094436.3625171-1-r...@kernel.org
* rebase on v6.9-rc4 to avoid a conflict in kprobes
* add copyrights to mm/execmem.c (Luis)
* fix spelling (Ingo)
* define MODULES_VADDDR for sparc (Sam)
* consistently initialize struct execmem_info (Peter)
* reduce #ifdefs in function bodies in kprobes (Masami) 

v4: https://lore.kernel.org/all/20240411160051.2093261-1-r...@kernel.org
* rebase on v6.9-rc2
* rename execmem_params to execmem_info and execmem_arch_params() to
  execmem_arch_setup()
* use single execmem_alloc() API instead of execmem_{text,data}_alloc() (Song)
* avoid extra copy of execmem parameters (Rick)
* run execmem_init() as core_initcall() except for the architectures that
  may allocated text really early (currently only x86) (Will)
* add acks for some of arm64 and riscv changes, thanks Will and Alexandre
* new commits:
  - drop call to kasan_alloc_module_shadow() on arm64 because it's not
needed anymore
  - rename MODULE_START to MODULES_VADDR on MIPS
  - use CONFIG_EXECMEM instead of CONFIG_MODULES on powerpc as per Christophe:
https://lore.kernel.org/all/79062fa3-3402-47b3-8920-9231ad05e...@csgroup.eu/

v3: https://lore.kernel.org/all/20230918072955.2507221-1-r...@kernel.org
* add type parameter to execmem allocation APIs
* remove BPF dependency on modules

v2: https://lore.kernel.org/all/20230616085038.4121892-1-r...@kernel.org
* Separate "module" and "others" allocations with execmem_text_alloc()
and jit_text_alloc()
* Drop ROX entailment on x86
* Add ack for nios2 changes, thanks Dinh Nguyen

v1: https://lore.kernel.org/all/20230601101257.530867-1-r...@kernel.org

= Cover letter from v1 (sligtly updated) =

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystmes
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

A centralized infrastructure for code allocation allows allocations of
executable memory as ROX, and future optimizations such as caching large
pages for better iTLB performance and providing sub-page allocations for
users that only need small jit code snippets.

Rick Edgecombe proposed perm_alloc extension to vmalloc [1] and Song Liu
proposed execmem_alloc [2], but both these approaches were targeting BPF
allocations and lacked the ground work to abstract executable allocations
and split them from the modules core.

Thomas Gleixner suggested to express module allocation restrictions and
requirements as struct mod_alloc_type_params [3] that would define ranges,
protections and other parameters for different types of allocations used by
modules and following that suggestion Song separated allocations of
different types in modules (commit ac3b43283923 ("module: replace
module_layout with module_memory")) and posted "Type aware module
allocator" set [4].

I liked the idea of parametrising code allocation requirements as a
structure, but I believe the original proposal and Song's module allocator
was too module centric, so I came up with these patches.

This set splits code allocation from modules by introducing execmem_alloc()
and and execmem_free(), APIs, replaces call sites of module_alloc() and
module_memfree() with the new APIs and implements core text and related
allocations in a central place.

Instead of architecture specific overrides for module_alloc(), the
architectures that require non-default behaviour for text allocation must
fill execmem_info structure and implement execmem_arch_setup() that returns
a pointer to that structure. If an architecture does not implement
execmem_arch_setup(), the defaults compatible with the current
modules::module_alloc() are used.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem APIs
take a type argument, that will be used to identify the calling subsystem
and to allow architectures to define par

56 matches

Mail list logo