[PATCH v2] tracing/probes: Fix memory leak in traceprobe_parse_probe_arg_body

2024-04-26 Thread lumingyindetect
From: LuMingYin 

If traceprobe_parse_probe_arg_body() fails to allocate 'parg->fmt', it
jumps to 'out' instead of 'fail' by mistake. In the result, in this
case the 'tmp' buffer is not freed and leaks its memory.

Fix it by jumping to 'fail' in that case.

Fixes: 032330abd08b ("tracing/probes: Cleanup probe argument parser")
Signed-off-by: LuMingYin 
---
 kernel/trace/trace_probe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index dfe3ee6035ec..42bc0f362226 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1466,7 +1466,7 @@ static int traceprobe_parse_probe_arg_body(const char 
*argv, ssize_t *size,
parg->fmt = kmalloc(len, GFP_KERNEL);
if (!parg->fmt) {
ret = -ENOMEM;
-   goto out;
+   goto fail;
}
snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
 parg->count);
-- 
2.25.1




Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-26 Thread Dmitry Baryshkov
On Sat, 27 Apr 2024 at 04:03, Chris Lew  wrote:
>
>
>
> On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:
> > diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
> > b/drivers/remoteproc/qcom_q6v5_adsp.c
> > index 1d24c9b656a8..02d0c626b03b 100644
> > --- a/drivers/remoteproc/qcom_q6v5_adsp.c
> > +++ b/drivers/remoteproc/qcom_q6v5_adsp.c
> > @@ -23,6 +23,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >
> > @@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)
> >   int ret;
> >   unsigned int val;
> >
> > - ret = qcom_q6v5_prepare(>q6v5);
> > + ret = qcom_pdm_get();
> >   if (ret)
> >   return ret;
>
> Would it make sense to try and model this as a rproc subdev? This
> section of the remoteproc code seems to be focused on making specific
> calls to setup and enable hardware resources, where as pd mapper is
> software.
>
> sysmon and ssr are also purely software and they are modeled as subdevs
> in qcom_common. I'm not an expert on remoteproc organization but this
> was just a thought.

Well, the issue is that the pd-mapper is a global, not a per-remoteproc instance

>
> Thanks!
> Chris
>
> >
> > + ret = qcom_q6v5_prepare(>q6v5);
> > + if (ret)
> > + goto put_pdm;
> > +
> >   ret = adsp_map_carveout(rproc);
> >   if (ret) {
> >   dev_err(adsp->dev, "ADSP smmu mapping failed\n");
> > @@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
> >   adsp_unmap_carveout(rproc);
> >   disable_irqs:
> >   qcom_q6v5_unprepare(>q6v5);
> > +put_pdm:
> > + qcom_pdm_release();
> >
> >   return ret;
> >   }
>


-- 
With best wishes
Dmitry



Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-26 Thread Chris Lew




On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:

diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/drivers/remoteproc/qcom_q6v5_adsp.c
index 1d24c9b656a8..02d0c626b03b 100644
--- a/drivers/remoteproc/qcom_q6v5_adsp.c
+++ b/drivers/remoteproc/qcom_q6v5_adsp.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)

int ret;
unsigned int val;
  
-	ret = qcom_q6v5_prepare(>q6v5);

+   ret = qcom_pdm_get();
if (ret)
return ret;


Would it make sense to try and model this as a rproc subdev? This 
section of the remoteproc code seems to be focused on making specific 
calls to setup and enable hardware resources, where as pd mapper is 
software.


sysmon and ssr are also purely software and they are modeled as subdevs 
in qcom_common. I'm not an expert on remoteproc organization but this 
was just a thought.


Thanks!
Chris

  
+	ret = qcom_q6v5_prepare(>q6v5);

+   if (ret)
+   goto put_pdm;
+
ret = adsp_map_carveout(rproc);
if (ret) {
dev_err(adsp->dev, "ADSP smmu mapping failed\n");
@@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
adsp_unmap_carveout(rproc);
  disable_irqs:
qcom_q6v5_unprepare(>q6v5);
+put_pdm:
+   qcom_pdm_release();
  
  	return ret;

  }





Re: [PATCH v7 5/6] soc: qcom: add pd-mapper implementation

2024-04-26 Thread Chris Lew




On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:


+static int qcom_pdm_start(void)
+{
+   const struct of_device_id *match;
+   const struct qcom_pdm_domain_data * const *domains;
+   struct device_node *root;
+   int ret, i;
+
+   root = of_find_node_by_path("/");
+   if (!root)
+   return -ENODEV;
+
+   match = of_match_node(qcom_pdm_domains, root);
+   of_node_put(root);
+   if (!match) {
+   pr_notice("PDM: no support for the platform, userspace daemon might 
be required.\n");
+   return -ENODEV;
+   }
+
+   domains = match->data;
+   if (!domains) {
+   pr_debug("PDM: no domains\n");
+   return -ENODEV;
+   }
+
+   mutex_lock(_pdm_mutex);
+   for (i = 0; domains[i]; i++) {
+   ret = qcom_pdm_add_domain(domains[i]);
+   if (ret)
+   goto free_domains;
+   }
+
+   ret = qmi_handle_init(_pdm_handle, 1024,
+ NULL, qcom_pdm_msg_handlers);


1024 here seems arbitrary, I think most other usage of qmi_handle_init 
has a macro defined for the max message length of the qmi service.



+   if (ret)
+   goto free_domains;
+
+   ret = qmi_add_server(_pdm_handle, SERVREG_LOCATOR_SERVICE,
+SERVREG_QMI_VERSION, SERVREG_QMI_INSTANCE);
+   if (ret) {
+   pr_err("PDM: error adding server %d\n", ret);
+   goto release_handle;
+   }
+   mutex_unlock(_pdm_mutex);
+
+   return 0;
+
+release_handle:
+   qmi_handle_release(_pdm_handle);
+
+free_domains:
+   qcom_pdm_free_domains();
+   mutex_unlock(_pdm_mutex);
+
+   return ret;
+}
+
+static void qcom_pdm_stop(void)
+{
+   qmi_del_server(_pdm_handle, SERVREG_LOCATOR_SERVICE,
+  SERVREG_QMI_VERSION, SERVREG_QMI_INSTANCE);
+
+   qmi_handle_release(_pdm_handle);
+


I don't think doing an explicit qmi_del_server() is necessary. As part 
of the qmi_handle_release(), the qrtr socket will be closed and the qrtr 
ns will broadcast a DEL_SERVER and DEL_CLIENT notification as part of 
the cleanup.



+   qcom_pdm_free_domains();
+}
+




Re: [PATCH v6 00/16] mm: jit/text allocator

2024-04-26 Thread Luis Chamberlain
On Fri, Apr 26, 2024 at 11:28:38AM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> The patches are also available in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v6
> 
> v6 changes:
> * restore patch "arm64: extend execmem_info for generated code
>   allocations" that disappeared in v5 rebase
> * update execmem initialization so that by default it will be
>   initialized early while late initialization will be an opt-in

I've taken this new iteration again through modules-next so to help get
more testing exposure to this. If we run into conflicts again with mm
we can see if Andrew is willing to take this in through there. However,
it may make sense to only consider up to "mm: introduce execmem_alloc() and
execmem_free()" for v6.10 given we're bound to likely find more issues
and we are already at rc5.

  Luis



Re: [RFC][PATCH] uprobe: support for private hugetlb mappings

2024-04-26 Thread Guillaume Morin
On 26 Apr  9:19, David Hildenbrand wrote:
> A couple of points:
> 
> a) Don't use page_mapcount(). Either folio_mapcount(), but likely you want
> to check PageAnonExclusive.
> 
> b) If you're not following the can_follow_write_pte/_pmd model, you are
> doing something wrong :)
> 
> c) The code was heavily changed in mm/mm-unstable. It was merged with t
> the common code.
> 
> Likely, in mm/mm-unstable, the existing can_follow_write_pte and
> can_follow_write_pmd checks will already cover what you want in most cases.
> 
> We'd need a can_follow_write_pud() to cover follow_huge_pud() and
> (unfortunately) something to handle follow_hugepd() as well similarly.
> 
> Copy-pasting what we do in can_follow_write_pte() and adjusting for
> different PTE types is the right thing to do. Maybe now it's time to factor
> out the common checks into a separate helper.

I tried to get the hugepd stuff right but this was the first I heard
about it :-) Afaict follow_huge_pmd and friends were already DTRT

diff --git a/mm/gup.c b/mm/gup.c
index 2f7baf96f65..9df97ea555f 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -517,6 +517,34 @@ static int record_subpages(struct page *page, unsigned 
long sz,
 }
 #endif /* CONFIG_ARCH_HAS_HUGEPD || CONFIG_HAVE_GUP_FAST */
 
+/* Common code for can_follow_write_* */
+static inline bool can_follow_write_common(struct page *page,
+  struct vm_area_struct *vma,
+  unsigned int flags)
+{
+   /* Maybe FOLL_FORCE is set to override it? */
+   if (!(flags & FOLL_FORCE))
+   return false;
+
+   /* But FOLL_FORCE has no effect on shared mappings */
+   if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED))
+   return false;
+
+   /* ... or read-only private ones */
+   if (!(vma->vm_flags & VM_MAYWRITE))
+   return false;
+
+   /* ... or already writable ones that just need to take a write fault */
+   if (vma->vm_flags & VM_WRITE)
+   return false;
+
+   /*
+* See can_change_pte_writable(): we broke COW and could map the page
+* writable if we have an exclusive anonymous page ...
+*/
+   return page && PageAnon(page) && PageAnonExclusive(page);
+}
+
 #ifdef CONFIG_ARCH_HAS_HUGEPD
 static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
  unsigned long sz)
@@ -525,6 +553,17 @@ static unsigned long hugepte_addr_end(unsigned long addr, 
unsigned long end,
return (__boundary - 1 < end - 1) ? __boundary : end;
 }
 
+static inline bool can_follow_write_hugepd(pte_t pte, struct page* page,
+  struct vm_area_struct *vma,
+  unsigned int flags)
+{
+   /* If the pte is writable, we can write to the page. */
+   if (pte_access_permitted(pte, flags & FOLL_WRITE))
+   return true;
+
+   return can_follow_write_common(page, vma, flags);
+}
+
 static int gup_fast_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
unsigned long end, unsigned int flags, struct page **pages,
int *nr)
@@ -541,13 +580,13 @@ static int gup_fast_hugepte(pte_t *ptep, unsigned long 
sz, unsigned long addr,
 
pte = huge_ptep_get(ptep);
 
-   if (!pte_access_permitted(pte, flags & FOLL_WRITE))
-   return 0;
-
/* hugepages are never "special" */
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
page = pte_page(pte);
+   if (!can_follow_write_hugepd(pte, page, vma, flags))
+   return 0;
+
refs = record_subpages(page, sz, addr, end, pages + *nr);
 
folio = try_grab_folio(page, refs, flags);
@@ -668,7 +707,25 @@ static struct page *no_page_table(struct vm_area_struct 
*vma,
return NULL;
 }
 
+
+
 #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES
+/* FOLL_FORCE can write to even unwritable PUDs in COW mappings. */
+static inline bool can_follow_write_pud(pud_t pud, struct page *page,
+   struct vm_area_struct *vma,
+   unsigned int flags)
+{
+   /* If the pud is writable, we can write to the page. */
+   if (pud_write(pud))
+   return true;
+
+   if (!can_follow_write_common(page, vma, flags))
+   return false;
+
+   /* ... and a write-fault isn't required for other reasons. */
+   return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud);
+}
+
 static struct page *follow_huge_pud(struct vm_area_struct *vma,
unsigned long addr, pud_t *pudp,
int flags, struct follow_page_context *ctx)
@@ -681,7 +738,8 @@ static struct page *follow_huge_pud(struct vm_area_struct 
*vma,
 
assert_spin_locked(pud_lockptr(mm, pudp));
 
-   if ((flags & FOLL_WRITE) && !pud_write(pud))
+   if ((flags & FOLL_WRITE) &&
+   

[PATCH v9 2/5] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-04-26 Thread Andrew Davis
From: Martyn Welch 

The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
the MCU domain. This core is typically used for safety applications in a
stand alone mode. However, some application (non safety related) may
want to use the M4F core as a generic remote processor with IPC to the
host processor. The M4F core has internal IRAM and DRAM memories and are
exposed to the system bus for code and data loading.

A remote processor driver is added to support this subsystem, including
being able to load and boot the M4F core. Loading includes to M4F
internal memories and predefined external code/data memories. The
carve outs for external contiguous memory is defined in the M4F device
node and should match with the external memory declarations in the M4F
image binary. The M4F subsystem has two resets. One reset is for the
entire subsystem i.e including the internal memories and the other, a
local reset is only for the M4F processing core. When loading the image,
the driver first releases the subsystem reset, loads the firmware image
and then releases the local reset to let the M4F processing core run.

Signed-off-by: Martyn Welch 
Signed-off-by: Hari Nagalla 
Signed-off-by: Andrew Davis 
---
 drivers/remoteproc/Kconfig   |  13 +
 drivers/remoteproc/Makefile  |   1 +
 drivers/remoteproc/ti_k3_m4_remoteproc.c | 785 +++
 3 files changed, 799 insertions(+)
 create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 48845dc8fa852..1a7c0330c91a9 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -339,6 +339,19 @@ config TI_K3_DSP_REMOTEPROC
  It's safe to say N here if you're not interested in utilizing
  the DSP slave processors.
 
+config TI_K3_M4_REMOTEPROC
+   tristate "TI K3 M4 remoteproc support"
+   depends on ARCH_K3 || COMPILE_TEST
+   select MAILBOX
+   select OMAP2PLUS_MBOX
+   help
+ Say m here to support TI's M4 remote processor subsystems
+ on various TI K3 family of SoCs through the remote processor
+ framework.
+
+ It's safe to say N here if you're not interested in utilizing
+ a remote processor.
+
 config TI_K3_R5_REMOTEPROC
tristate "TI K3 R5 remoteproc support"
depends on ARCH_K3
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index 91314a9b43cef..5ff4e2fee4abd 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC)   += st_remoteproc.o
 obj-$(CONFIG_ST_SLIM_REMOTEPROC)   += st_slim_rproc.o
 obj-$(CONFIG_STM32_RPROC)  += stm32_rproc.o
 obj-$(CONFIG_TI_K3_DSP_REMOTEPROC) += ti_k3_dsp_remoteproc.o
+obj-$(CONFIG_TI_K3_M4_REMOTEPROC)  += ti_k3_m4_remoteproc.o
 obj-$(CONFIG_TI_K3_R5_REMOTEPROC)  += ti_k3_r5_remoteproc.o
 obj-$(CONFIG_XLNX_R5_REMOTEPROC)   += xlnx_r5_remoteproc.o
diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
b/drivers/remoteproc/ti_k3_m4_remoteproc.c
new file mode 100644
index 0..0030e509f6b5d
--- /dev/null
+++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
@@ -0,0 +1,785 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * TI K3 Cortex-M4 Remote Processor(s) driver
+ *
+ * Copyright (C) 2021-2024 Texas Instruments Incorporated - https://www.ti.com/
+ * Hari Nagalla 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "omap_remoteproc.h"
+#include "remoteproc_internal.h"
+#include "ti_sci_proc.h"
+
+/**
+ * struct k3_m4_rproc_mem - internal memory structure
+ * @cpu_addr: MPU virtual address of the memory region
+ * @bus_addr: Bus address used to access the memory region
+ * @dev_addr: Device address of the memory region from remote processor view
+ * @size: Size of the memory region
+ */
+struct k3_m4_rproc_mem {
+   void __iomem *cpu_addr;
+   phys_addr_t bus_addr;
+   u32 dev_addr;
+   size_t size;
+};
+
+/**
+ * struct k3_m4_rproc_mem_data - memory definitions for a remote processor
+ * @name: name for this memory entry
+ * @dev_addr: device address for the memory entry
+ */
+struct k3_m4_rproc_mem_data {
+   const char *name;
+   const u32 dev_addr;
+};
+
+/**
+ * struct k3_m4_rproc_dev_data - device data structure for a remote processor
+ * @mems: pointer to memory definitions for a remote processor
+ * @num_mems: number of memory regions in @mems
+ * @uses_lreset: flag to denote the need for local reset management
+ */
+struct k3_m4_rproc_dev_data {
+   const struct k3_m4_rproc_mem_data *mems;
+   u32 num_mems;
+   bool uses_lreset;
+};
+
+/**
+ * struct k3_m4_rproc - k3 remote processor driver structure
+ * @dev: cached device pointer
+ * @rproc: remoteproc device handle
+ * @mem: internal memory regions data
+ * @num_mems: number of internal memory regions
+ * @rmem: reserved 

[PATCH v9 0/5] TI K3 M4F support on AM62 SoCs

2024-04-26 Thread Andrew Davis
Hello all,

This is the continuation of the M4F RProc support series from here[0].
I'm helping out with the upstream task for Hari and so this version(v8)
is a little different than the previous(v7) postings[0]. Most notable
change I've introduced being the patches factoring out common support
from the current K3 R5 and DSP drivers have been dropped. I'd like
to do that re-factor *after* getting this driver in shape, that way
we have 3 similar drivers to factor out from vs trying to make those
changes in parallel with the series adding M4 support.

Anyway, details on our M4F subsystem can be found the
the AM62 TRM in the section on the same:

AM62x Technical Reference Manual (SPRUIV7A – MAY 2022)
https://www.ti.com/lit/pdf/SPRUIV7A

Thanks,
Andrew

[0] 
https://lore.kernel.org/linux-arm-kernel/20240202175538.1705-5-hnaga...@ti.com/T/

Changes for v9:
 - Fixed reserved-memory.yaml text in [1/5]
 - Split dts patch into one for SoC and one for board enable
 - Corrected DT property order and formatting [4/5][5/5]

Hari Nagalla (4):
  dt-bindings: remoteproc: k3-m4f: Add K3 AM64x SoCs
  arm64: dts: ti: k3-am62: Add M4F remoteproc node
  arm64: dts: ti: k3-am625-sk: Add M4F remoteproc node
  arm64: defconfig: Enable TI K3 M4 remoteproc driver

Martyn Welch (1):
  remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

 .../bindings/remoteproc/ti,k3-m4f-rproc.yaml  | 125 +++
 arch/arm64/boot/dts/ti/k3-am62-mcu.dtsi   |  13 +
 .../arm64/boot/dts/ti/k3-am62x-sk-common.dtsi |  19 +
 arch/arm64/configs/defconfig  |   1 +
 drivers/remoteproc/Kconfig|  13 +
 drivers/remoteproc/Makefile   |   1 +
 drivers/remoteproc/ti_k3_m4_remoteproc.c  | 785 ++
 7 files changed, 957 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml
 create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c

-- 
2.39.2




[PATCH v9 3/5] arm64: dts: ti: k3-am62: Add M4F remoteproc node

2024-04-26 Thread Andrew Davis
From: Hari Nagalla 

The AM62x SoCs of the TI K3 family have a Cortex M4F core in the MCU
domain. This core can be used by non safety applications as a remote
processor. When used as a remote processor with virtio/rpmessage IPC,
two carveout reserved memory nodes are needed.

Disable by default as this node is not complete until mailbox data
is provided in the board level DT.

Signed-off-by: Hari Nagalla 
Signed-off-by: Andrew Davis 
---
 arch/arm64/boot/dts/ti/k3-am62-mcu.dtsi | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/boot/dts/ti/k3-am62-mcu.dtsi 
b/arch/arm64/boot/dts/ti/k3-am62-mcu.dtsi
index e66d486ef1f21..7f6f0007e8e81 100644
--- a/arch/arm64/boot/dts/ti/k3-am62-mcu.dtsi
+++ b/arch/arm64/boot/dts/ti/k3-am62-mcu.dtsi
@@ -173,4 +173,17 @@ mcu_mcan1: can@4e18000 {
bosch,mram-cfg = <0x0 128 64 64 64 64 32 32>;
status = "disabled";
};
+
+   mcu_m4fss: m4fss@500 {
+   compatible = "ti,am64-m4fss";
+   reg = <0x00 0x500 0x00 0x3>,
+ <0x00 0x504 0x00 0x1>;
+   reg-names = "iram", "dram";
+   resets = <_reset 9 1>;
+   firmware-name = "am62-mcu-m4f0_0-fw";
+   ti,sci = <>;
+   ti,sci-dev-id = <9>;
+   ti,sci-proc-ids = <0x18 0xff>;
+   status = "disabled";
+   };
 };
-- 
2.39.2




[PATCH v9 5/5] arm64: defconfig: Enable TI K3 M4 remoteproc driver

2024-04-26 Thread Andrew Davis
From: Hari Nagalla 

Some K3 platform devices (AM64x, AM62x) have a Cortex M4 core. Build
the M4 remote proc driver as a module for these platforms.

Signed-off-by: Hari Nagalla 
Signed-off-by: Andrew Davis 
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 2c30d617e1802..04e8e2ca4aa10 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -1352,6 +1352,7 @@ CONFIG_QCOM_Q6V5_PAS=m
 CONFIG_QCOM_SYSMON=m
 CONFIG_QCOM_WCNSS_PIL=m
 CONFIG_TI_K3_DSP_REMOTEPROC=m
+CONFIG_TI_K3_M4_REMOTEPROC=m
 CONFIG_TI_K3_R5_REMOTEPROC=m
 CONFIG_RPMSG_CHAR=m
 CONFIG_RPMSG_CTRL=m
-- 
2.39.2




[PATCH v9 1/5] dt-bindings: remoteproc: k3-m4f: Add K3 AM64x SoCs

2024-04-26 Thread Andrew Davis
From: Hari Nagalla 

K3 AM64x SoC has a Cortex M4F subsystem in the MCU voltage domain.
The remote processor's life cycle management and IPC mechanisms are
similar across the R5F and M4F cores from remote processor driver
point of view. However, there are subtle differences in image loading
and starting the M4F subsystems.

The YAML binding document provides the various node properties to be
configured by the consumers of the M4F subsystem.

Signed-off-by: Martyn Welch 
Signed-off-by: Hari Nagalla 
Signed-off-by: Andrew Davis 
Reviewed-by: Conor Dooley 
---
 .../bindings/remoteproc/ti,k3-m4f-rproc.yaml  | 125 ++
 1 file changed, 125 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml

diff --git a/Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml 
b/Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml
new file mode 100644
index 0..2bd0752b6ba9e
--- /dev/null
+++ b/Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml
@@ -0,0 +1,125 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/remoteproc/ti,k3-m4f-rproc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: TI K3 M4F processor subsystems
+
+maintainers:
+  - Hari Nagalla 
+  - Mathieu Poirier 
+
+description: |
+  Some K3 family SoCs have Arm Cortex M4F cores. AM64x is a SoC in K3
+  family with a M4F core. Typically safety oriented applications may use
+  the M4F core in isolation without an IPC. Where as some industrial and
+  home automation applications, may use the M4F core as a remote processor
+  with IPC communications.
+
+$ref: /schemas/arm/keystone/ti,k3-sci-common.yaml#
+
+properties:
+  compatible:
+enum:
+  - ti,am64-m4fss
+
+  power-domains:
+maxItems: 1
+
+  "#address-cells":
+const: 2
+
+  "#size-cells":
+const: 2
+
+  reg:
+items:
+  - description: IRAM internal memory region
+  - description: DRAM internal memory region
+
+  reg-names:
+items:
+  - const: iram
+  - const: dram
+
+  resets:
+maxItems: 1
+
+  firmware-name:
+maxItems: 1
+description: Name of firmware to load for the M4F core
+
+  mboxes:
+description:
+  OMAP Mailbox specifier denoting the sub-mailbox, to be used for
+  communication with the remote processor. This property should match
+  with the sub-mailbox node used in the firmware image.
+maxItems: 1
+
+  memory-region:
+description:
+  phandle to the reserved memory nodes to be associated with the
+  remoteproc device. Optional memory regions available for firmware
+  specific purposes.
+  (see reserved-memory/reserved-memory.yaml in dtschema project)
+maxItems: 8
+items:
+  - description: regions used for DMA allocations like vrings, vring 
buffers
+ and memory dedicated to firmware's specific purposes.
+additionalItems: true
+
+required:
+  - compatible
+  - reg
+  - reg-names
+  - ti,sci
+  - ti,sci-dev-id
+  - ti,sci-proc-ids
+  - resets
+  - firmware-name
+
+unevaluatedProperties: false
+
+examples:
+  - |
+reserved-memory {
+#address-cells = <2>;
+#size-cells = <2>;
+
+mcu_m4fss_dma_memory_region: m4f-dma-memory@9cb0 {
+compatible = "shared-dma-pool";
+reg = <0x00 0x9cb0 0x00 0x10>;
+no-map;
+};
+
+mcu_m4fss_memory_region: m4f-memory@9cc0 {
+compatible = "shared-dma-pool";
+reg = <0x00 0x9cc0 0x00 0xe0>;
+no-map;
+};
+};
+
+soc {
+#address-cells = <2>;
+#size-cells = <2>;
+
+mailbox0_cluster0: mailbox-0 {
+#mbox-cells = <1>;
+};
+
+remoteproc@500 {
+compatible = "ti,am64-m4fss";
+reg = <0x00 0x500 0x00 0x3>,
+  <0x00 0x504 0x00 0x1>;
+reg-names = "iram", "dram";
+resets = <_reset 9 1>;
+firmware-name = "am62-mcu-m4f0_0-fw";
+mboxes = <_cluster0>, <_m4_0>;
+memory-region = <_m4fss_dma_memory_region>,
+<_m4fss_memory_region>;
+ti,sci = <>;
+ti,sci-dev-id = <9>;
+ti,sci-proc-ids = <0x18 0xff>;
+ };
+};
-- 
2.39.2




[PATCH v9 4/5] arm64: dts: ti: k3-am625-sk: Add M4F remoteproc node

2024-04-26 Thread Andrew Davis
From: Hari Nagalla 

The AM62x SoCs of the TI K3 family have a Cortex M4F core in the MCU
domain. This core can be used by non safety applications as a remote
processor. When used as a remote processor with virtio/rpmessage IPC,
two carveout reserved memory nodes are needed. The first region is used
as a DMA pool for the rproc device, and the second region will furnish
the static carveout regions for the firmware memory.

The current carveout addresses and sizes are defined statically for
each rproc device. The M4F processor does not have an MMU, and as such
requires the exact memory used by the firmware to be set-aside.

Signed-off-by: Hari Nagalla 
Signed-off-by: Andrew Davis 
---
 .../arm64/boot/dts/ti/k3-am62x-sk-common.dtsi | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi 
b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
index 3c45782ab2b78..167bec5c80006 100644
--- a/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
+++ b/arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi
@@ -48,6 +48,18 @@ ramoops@9ca0 {
pmsg-size = <0x8000>;
};
 
+   mcu_m4fss_dma_memory_region: m4f-dma-memory@9cb0 {
+   compatible = "shared-dma-pool";
+   reg = <0x00 0x9cb0 0x00 0x10>;
+   no-map;
+   };
+
+   mcu_m4fss_memory_region: m4f-memory@9cc0 {
+   compatible = "shared-dma-pool";
+   reg = <0x00 0x9cc0 0x00 0xe0>;
+   no-map;
+   };
+
secure_tfa_ddr: tfa@9e78 {
reg = <0x00 0x9e78 0x00 0x8>;
alignment = <0x1000>;
@@ -457,6 +469,13 @@ mbox_m4_0: mbox-m4-0 {
};
 };
 
+_m4fss {
+   mboxes = <_cluster0 _m4_0>;
+   memory-region = <_m4fss_dma_memory_region>,
+   <_m4fss_memory_region>;
+   status = "okay";
+};
+
  {
bootph-all;
status = "okay";
-- 
2.39.2




Re: [PATCH v6 08/16] mm/execmem, arch: convert remaining overrides of module_alloc to execmem

2024-04-26 Thread Song Liu
On Fri, Apr 26, 2024 at 1:30 AM Mike Rapoport  wrote:
>
> From: "Mike Rapoport (IBM)" 
>
> Extend execmem parameters to accommodate more complex overrides of
> module_alloc() by architectures.
>
> This includes specification of a fallback range required by arm, arm64
> and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for
> allocation of KASAN shadow required by s390 and x86 and support for
> late initialization of execmem required by arm64.
>
> The core implementation of execmem_alloc() takes care of suppressing
> warnings when the initial allocation fails but there is a fallback range
> defined.
>
> Signed-off-by: Mike Rapoport (IBM) 
> Acked-by: Will Deacon 

nit: We should probably move the logic for ARCH_WANTS_EXECMEM_LATE
to a separate patch.

Otherwise,

Acked-by: Song Liu 



Re: [PATCH v6 07/16] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2024-04-26 Thread Song Liu
On Fri, Apr 26, 2024 at 1:30 AM Mike Rapoport  wrote:
>
> From: "Mike Rapoport (IBM)" 
>
> Several architectures override module_alloc() only to define address
> range for code allocations different than VMALLOC address space.
>
> Provide a generic implementation in execmem that uses the parameters for
> address space ranges, required alignment and page protections provided
> by architectures.
>
> The architectures must fill execmem_info structure and implement
> execmem_arch_setup() that returns a pointer to that structure. This way the
> execmem initialization won't be called from every architecture, but rather
> from a central place, namely a core_initcall() in execmem.
>
> The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
> with the parameters defined by the architectures.  If an architecture does
> not implement execmem_arch_setup(), execmem_alloc() will fall back to
> module_alloc().
>
> Signed-off-by: Mike Rapoport (IBM) 

Acked-by: Song Liu 



Re: [PATCH v6 06/16] mm: introduce execmem_alloc() and execmem_free()

2024-04-26 Thread Song Liu
On Fri, Apr 26, 2024 at 1:30 AM Mike Rapoport  wrote:
>
> From: "Mike Rapoport (IBM)" 
>
> module_alloc() is used everywhere as a mean to allocate memory for code.
>
> Beside being semantically wrong, this unnecessarily ties all subsystems
> that need to allocate code, such as ftrace, kprobes and BPF to modules and
> puts the burden of code allocation to the modules code.
>
> Several architectures override module_alloc() because of various
> constraints where the executable memory can be located and this causes
> additional obstacles for improvements of code allocation.
>
> Start splitting code allocation from modules by introducing execmem_alloc()
> and execmem_free() APIs.
>
> Initially, execmem_alloc() is a wrapper for module_alloc() and
> execmem_free() is a replacement of module_memfree() to allow updating all
> call sites to use the new APIs.
>
> Since architectures define different restrictions on placement,
> permissions, alignment and other parameters for memory that can be used by
> different subsystems that allocate executable memory, execmem_alloc() takes
> a type argument, that will be used to identify the calling subsystem and to
> allow architectures define parameters for ranges suitable for that
> subsystem.
>
> No functional changes.
>
> Signed-off-by: Mike Rapoport (IBM) 
> Acked-by: Masami Hiramatsu (Google) 

Acked-by: Song Liu 



Re: [PATCH v6 05/16] module: make module_memory_{alloc,free} more self-contained

2024-04-26 Thread Song Liu
On Fri, Apr 26, 2024 at 1:30 AM Mike Rapoport  wrote:
>
> From: "Mike Rapoport (IBM)" 
>
> Move the logic related to the memory allocation and freeing into
> module_memory_alloc() and module_memory_free().
>
> Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Song Liu 



Re: [PATCHv3 bpf-next 6/7] selftests/bpf: Add uretprobe compat test

2024-04-26 Thread Andrii Nakryiko
On Sun, Apr 21, 2024 at 12:43 PM Jiri Olsa  wrote:
>
> Adding test that adds return uprobe inside 32 bit task
> and verify the return uprobe and attached bpf programs
> get properly executed.
>
> Signed-off-by: Jiri Olsa 
> ---
>  tools/testing/selftests/bpf/.gitignore|  1 +
>  tools/testing/selftests/bpf/Makefile  |  6 ++-
>  .../selftests/bpf/prog_tests/uprobe_syscall.c | 40 +++
>  .../bpf/progs/uprobe_syscall_compat.c | 13 ++
>  4 files changed, 59 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall_compat.c
>
> diff --git a/tools/testing/selftests/bpf/.gitignore 
> b/tools/testing/selftests/bpf/.gitignore
> index f1aebabfb017..69d71223c0dd 100644
> --- a/tools/testing/selftests/bpf/.gitignore
> +++ b/tools/testing/selftests/bpf/.gitignore
> @@ -45,6 +45,7 @@ test_cpp
>  /veristat
>  /sign-file
>  /uprobe_multi
> +/uprobe_compat
>  *.ko
>  *.tmp
>  xskxceiver
> diff --git a/tools/testing/selftests/bpf/Makefile 
> b/tools/testing/selftests/bpf/Makefile
> index edc73f8f5aef..d170b63eca62 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -134,7 +134,7 @@ TEST_GEN_PROGS_EXTENDED = test_sock_addr 
> test_skb_cgroup_id_user \
> xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata \
> xdp_features bpf_test_no_cfi.ko
>
> -TEST_GEN_FILES += liburandom_read.so urandom_read sign-file uprobe_multi
> +TEST_GEN_FILES += liburandom_read.so urandom_read sign-file uprobe_multi 
> uprobe_compat

you need to add uprobe_compat to TRUNNER_EXTRA_FILES as well, no?

>
>  # Emit succinct information message describing current building step
>  # $1 - generic step name (e.g., CC, LINK, etc);
> @@ -761,6 +761,10 @@ $(OUTPUT)/uprobe_multi: uprobe_multi.c
> $(call msg,BINARY,,$@)
> $(Q)$(CC) $(CFLAGS) -O0 $(LDFLAGS) $^ $(LDLIBS) -o $@
>
> +$(OUTPUT)/uprobe_compat:
> +   $(call msg,BINARY,,$@)
> +   $(Q)echo "int main() { return 0; }" | $(CC) $(CFLAGS) -xc -m32 -O0 - 
> -o $@
> +
>  EXTRA_CLEAN := $(SCRATCH_DIR) $(HOST_SCRATCH_DIR)  \
> prog_tests/tests.h map_tests/tests.h verifier/tests.h   \
> feature bpftool \
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c 
> b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index 9233210a4c33..3770254d893b 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include "uprobe_syscall.skel.h"
>  #include "uprobe_syscall_call.skel.h"
> +#include "uprobe_syscall_compat.skel.h"
>
>  __naked unsigned long uretprobe_regs_trigger(void)
>  {
> @@ -291,6 +292,35 @@ static void test_uretprobe_syscall_call(void)
>  "read_trace_pipe_iter");
> ASSERT_EQ(found, 0, "found");
>  }
> +
> +static void trace_pipe_compat_cb(const char *str, void *data)
> +{
> +   if (strstr(str, "uretprobe compat") != NULL)
> +   (*(int *)data)++;
> +}
> +
> +static void test_uretprobe_compat(void)
> +{
> +   struct uprobe_syscall_compat *skel = NULL;
> +   int err, found = 0;
> +
> +   skel = uprobe_syscall_compat__open_and_load();
> +   if (!ASSERT_OK_PTR(skel, "uprobe_syscall_compat__open_and_load"))
> +   goto cleanup;
> +
> +   err = uprobe_syscall_compat__attach(skel);
> +   if (!ASSERT_OK(err, "uprobe_syscall_compat__attach"))
> +   goto cleanup;
> +
> +   system("./uprobe_compat");
> +
> +   ASSERT_OK(read_trace_pipe_iter(trace_pipe_compat_cb, , 1000),
> +"read_trace_pipe_iter");

why so complicated? can't you just set global variable that it was called

> +   ASSERT_EQ(found, 1, "found");
> +
> +cleanup:
> +   uprobe_syscall_compat__destroy(skel);
> +}
>  #else
>  static void test_uretprobe_regs_equal(void)
>  {
> @@ -306,6 +336,11 @@ static void test_uretprobe_syscall_call(void)
>  {
> test__skip();
>  }
> +
> +static void test_uretprobe_compat(void)
> +{
> +   test__skip();
> +}
>  #endif
>
>  void test_uprobe_syscall(void)
> @@ -320,3 +355,8 @@ void serial_test_uprobe_syscall_call(void)
>  {
> test_uretprobe_syscall_call();
>  }
> +
> +void serial_test_uprobe_syscall_compat(void)

and then no need for serial_test?

> +{
> +   test_uretprobe_compat();
> +}
> diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_compat.c 
> b/tools/testing/selftests/bpf/progs/uprobe_syscall_compat.c
> new file mode 100644
> index ..f8adde7f08e2
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_compat.c
> @@ -0,0 +1,13 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +
> +char _license[] SEC("license") = "GPL";
> +
> +SEC("uretprobe.multi/./uprobe_compat:main")
> +int 

Re: [PATCHv3 bpf-next 5/7] selftests/bpf: Add uretprobe syscall call from user space test

2024-04-26 Thread Andrii Nakryiko
On Sun, Apr 21, 2024 at 12:43 PM Jiri Olsa  wrote:
>
> Adding test to verify that when called from outside of the
> trampoline provided by kernel, the uretprobe syscall will cause
> calling process to receive SIGILL signal and the attached bpf
> program is no executed.
>
> Signed-off-by: Jiri Olsa 
> ---
>  .../selftests/bpf/prog_tests/uprobe_syscall.c | 92 +++
>  .../selftests/bpf/progs/uprobe_syscall_call.c | 15 +++
>  2 files changed, 107 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall_call.c
>

See nits below, but overall LGTM

Acked-by: Andrii Nakryiko 

[...]

> @@ -219,6 +301,11 @@ static void test_uretprobe_regs_change(void)
>  {
> test__skip();
>  }
> +
> +static void test_uretprobe_syscall_call(void)
> +{
> +   test__skip();
> +}
>  #endif
>
>  void test_uprobe_syscall(void)
> @@ -228,3 +315,8 @@ void test_uprobe_syscall(void)
> if (test__start_subtest("uretprobe_regs_change"))
> test_uretprobe_regs_change();
>  }
> +
> +void serial_test_uprobe_syscall_call(void)

does it need to be serial? non-serial are still run sequentially
within a process (there is no multi-threading), it's more about some
global effects on system.

> +{
> +   test_uretprobe_syscall_call();
> +}
> diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_call.c 
> b/tools/testing/selftests/bpf/progs/uprobe_syscall_call.c
> new file mode 100644
> index ..5ea03bb47198
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_call.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "vmlinux.h"
> +#include 
> +#include 
> +
> +struct pt_regs regs;
> +
> +char _license[] SEC("license") = "GPL";
> +
> +SEC("uretprobe//proc/self/exe:uretprobe_syscall_call")
> +int uretprobe(struct pt_regs *regs)
> +{
> +   bpf_printk("uretprobe called");

debugging leftover? we probably don't want to pollute trace_pipe from test

> +   return 0;
> +}
> --
> 2.44.0
>



Re: [PATCHv3 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe

2024-04-26 Thread Andrii Nakryiko
On Sun, Apr 21, 2024 at 12:42 PM Jiri Olsa  wrote:
>
> Adding uretprobe syscall instead of trap to speed up return probe.
>
> At the moment the uretprobe setup/path is:
>
>   - install entry uprobe
>
>   - when the uprobe is hit, it overwrites probed function's return address
> on stack with address of the trampoline that contains breakpoint
> instruction
>
>   - the breakpoint trap code handles the uretprobe consumers execution and
> jumps back to original return address
>
> This patch replaces the above trampoline's breakpoint instruction with new
> ureprobe syscall call. This syscall does exactly the same job as the trap
> with some more extra work:
>
>   - syscall trampoline must save original value for rax/r11/rcx registers
> on stack - rax is set to syscall number and r11/rcx are changed and
> used by syscall instruction
>
>   - the syscall code reads the original values of those registers and
> restore those values in task's pt_regs area
>
>   - only caller from trampoline exposed in '[uprobes]' is allowed,
> the process will receive SIGILL signal otherwise
>
> Even with some extra work, using the uretprobes syscall shows speed
> improvement (compared to using standard breakpoint):
>
>   On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz)
>
>   current:
> uretprobe-nop  :1.498 ± 0.000M/s
> uretprobe-push :1.448 ± 0.001M/s
> uretprobe-ret  :0.816 ± 0.001M/s
>
>   with the fix:
> uretprobe-nop  :1.969 ± 0.002M/s  < 31% speed up
> uretprobe-push :1.910 ± 0.000M/s  < 31% speed up
> uretprobe-ret  :0.934 ± 0.000M/s  < 14% speed up
>
>   On Amd (AMD Ryzen 7 5700U)
>
>   current:
> uretprobe-nop  :0.778 ± 0.001M/s
> uretprobe-push :0.744 ± 0.001M/s
> uretprobe-ret  :0.540 ± 0.001M/s
>
>   with the fix:
> uretprobe-nop  :0.860 ± 0.001M/s  < 10% speed up
> uretprobe-push :0.818 ± 0.001M/s  < 10% speed up
> uretprobe-ret  :0.578 ± 0.000M/s  <  7% speed up
>
> The performance test spawns a thread that runs loop which triggers
> uprobe with attached bpf program that increments the counter that
> gets printed in results above.
>
> The uprobe (and uretprobe) kind is determined by which instruction
> is being patched with breakpoint instruction. That's also important
> for uretprobes, because uprobe is installed for each uretprobe.
>
> The performance test is part of bpf selftests:
>   tools/testing/selftests/bpf/run_bench_uprobes.sh
>
> Note at the moment uretprobe syscall is supported only for native
> 64-bit process, compat process still uses standard breakpoint.
>
> Suggested-by: Andrii Nakryiko 
> Signed-off-by: Oleg Nesterov 
> Signed-off-by: Jiri Olsa 
> ---
>  arch/x86/kernel/uprobes.c | 115 ++
>  include/linux/uprobes.h   |   3 +
>  kernel/events/uprobes.c   |  24 +---
>  3 files changed, 135 insertions(+), 7 deletions(-)
>

LGTM as far as I can follow the code

Acked-by: Andrii Nakryiko 

[...]



Re: [PATCHv3 bpf-next 1/7] uprobe: Wire up uretprobe system call

2024-04-26 Thread Andrii Nakryiko
On Sun, Apr 21, 2024 at 12:42 PM Jiri Olsa  wrote:
>
> Wiring up uretprobe system call, which comes in following changes.
> We need to do the wiring before, because the uretprobe implementation
> needs the syscall number.
>
> Note at the moment uretprobe syscall is supported only for native
> 64-bit process.
>
> Signed-off-by: Jiri Olsa 
> ---
>  arch/x86/entry/syscalls/syscall_64.tbl | 1 +
>  include/linux/syscalls.h   | 2 ++
>  include/uapi/asm-generic/unistd.h  | 5 -
>  kernel/sys_ni.c| 2 ++
>  4 files changed, 9 insertions(+), 1 deletion(-)
>

LGTM

Acked-by: Andrii Nakryiko 

> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
> b/arch/x86/entry/syscalls/syscall_64.tbl
> index 7e8d46f4147f..af0a33ab06ee 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -383,6 +383,7 @@
>  459common  lsm_get_self_attr   sys_lsm_get_self_attr
>  460common  lsm_set_self_attr   sys_lsm_set_self_attr
>  461common  lsm_list_modulessys_lsm_list_modules
> +46264  uretprobe   sys_uretprobe
>
>  #
>  # Due to a historical design error, certain syscalls are numbered differently
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index e619ac10cd23..5318e0e76799 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -972,6 +972,8 @@ asmlinkage long sys_lsm_list_modules(u64 *ids, u32 *size, 
> u32 flags);
>  /* x86 */
>  asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int on);
>
> +asmlinkage long sys_uretprobe(void);
> +
>  /* pciconfig: alpha, arm, arm64, ia64, sparc */
>  asmlinkage long sys_pciconfig_read(unsigned long bus, unsigned long dfn,
> unsigned long off, unsigned long len,
> diff --git a/include/uapi/asm-generic/unistd.h 
> b/include/uapi/asm-generic/unistd.h
> index 75f00965ab15..8a747cd1d735 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -842,8 +842,11 @@ __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
>  #define __NR_lsm_list_modules 461
>  __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
>
> +#define __NR_uretprobe 462
> +__SYSCALL(__NR_uretprobe, sys_uretprobe)
> +
>  #undef __NR_syscalls
> -#define __NR_syscalls 462
> +#define __NR_syscalls 463
>
>  /*
>   * 32 bit systems traditionally used different
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index faad00cce269..be6195e0d078 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -391,3 +391,5 @@ COND_SYSCALL(setuid16);
>
>  /* restartable sequence */
>  COND_SYSCALL(rseq);
> +
> +COND_SYSCALL(uretprobe);
> --
> 2.44.0
>



Re: [PATCH v2 1/2] remoteproc: k3-r5: Wait for core0 power-up before powering up core1

2024-04-26 Thread Mathieu Poirier
Good day,

On Wed, Apr 24, 2024 at 06:35:03PM +0530, Beleswar Padhi wrote:
> From: Apurva Nandan 
> 
> PSC controller has a limitation that it can only power-up the second core
> when the first core is in ON state. Power-state for core0 should be equal
> to or higher than core1, else the kernel is seen hanging during rproc
> loading.
> 
> Make the powering up of cores sequential, by waiting for the current core
> to power-up before proceeding to the next core, with a timeout of 2sec.
> Add a wait queue event in k3_r5_cluster_rproc_init call, that will wait
> for the current core to be released from reset before proceeding with the
> next core.
> 
> Fixes: 6dedbd1d5443 ("remoteproc: k3-r5: Add a remoteproc driver for R5F 
> subsystem")
> 
> Signed-off-by: Apurva Nandan 

You need to add your own SoB as well.

> ---
>  drivers/remoteproc/ti_k3_r5_remoteproc.c | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c 
> b/drivers/remoteproc/ti_k3_r5_remoteproc.c
> index ad3415a3851b..5a9bd5d4a2ea 100644
> --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
> +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
> @@ -103,12 +103,14 @@ struct k3_r5_soc_data {
>   * @dev: cached device pointer
>   * @mode: Mode to configure the Cluster - Split or LockStep
>   * @cores: list of R5 cores within the cluster
> + * @core_transition: wait queue to sync core state changes
>   * @soc_data: SoC-specific feature data for a R5FSS
>   */
>  struct k3_r5_cluster {
>   struct device *dev;
>   enum cluster_mode mode;
>   struct list_head cores;
> + wait_queue_head_t core_transition;
>   const struct k3_r5_soc_data *soc_data;
>  };
>  
> @@ -128,6 +130,7 @@ struct k3_r5_cluster {
>   * @atcm_enable: flag to control ATCM enablement
>   * @btcm_enable: flag to control BTCM enablement
>   * @loczrama: flag to dictate which TCM is at device address 0x0
> + * @released_from_reset: flag to signal when core is out of reset
>   */
>  struct k3_r5_core {
>   struct list_head elem;
> @@ -144,6 +147,7 @@ struct k3_r5_core {
>   u32 atcm_enable;
>   u32 btcm_enable;
>   u32 loczrama;
> + bool released_from_reset;
>  };
>  
>  /**
> @@ -460,6 +464,8 @@ static int k3_r5_rproc_prepare(struct rproc *rproc)
>   ret);
>   return ret;
>   }
> + core->released_from_reset = true;
> + wake_up_interruptible(>core_transition);
>  
>   /*
>* Newer IP revisions like on J7200 SoCs support h/w auto-initialization
> @@ -1140,6 +1146,7 @@ static int k3_r5_rproc_configure_mode(struct 
> k3_r5_rproc *kproc)
>   return ret;
>   }
>  
> + core->released_from_reset = c_state;

I understand why this is needed but it line could be very cryptic for people
trying to understand this driver.  Please add a comment to describe what is
happening here.

>   ret = ti_sci_proc_get_status(core->tsp, _vec, , ,
>);
>   if (ret < 0) {
> @@ -1280,6 +1287,26 @@ static int k3_r5_cluster_rproc_init(struct 
> platform_device *pdev)
>   cluster->mode == CLUSTER_MODE_SINGLECPU ||
>   cluster->mode == CLUSTER_MODE_SINGLECORE)
>   break;
> +
> + /*
> +  * R5 cores require to be powered on sequentially, core0
> +  * should be in higher power state than core1 in a cluster
> +  * So, wait for current core to power up before proceeding
> +  * to next core and put timeout of 2sec for each core.
> +  *
> +  * This waiting mechanism is necessary because
> +  * rproc_auto_boot_callback() for core1 can be called before
> +  * core0 due to thread execution order.
> +  */
> + ret = wait_event_interruptible_timeout(cluster->core_transition,
> +
> core->released_from_reset,
> +msecs_to_jiffies(2000));
> + if (ret <= 0) {
> + dev_err(dev,
> + "Timed out waiting for %s core to power up!\n",
> + rproc->name);
> + return ret;
> + }
>   }
>  
>   return 0;
> @@ -1709,6 +1736,7 @@ static int k3_r5_probe(struct platform_device *pdev)
>   cluster->dev = dev;
>   cluster->soc_data = data;
>   INIT_LIST_HEAD(>cores);
> + init_waitqueue_head(>core_transition);
>  
>   ret = of_property_read_u32(np, "ti,cluster-mode", >mode);
>   if (ret < 0 && ret != -EINVAL) {
> -- 
> 2.34.1
> 



Re: [RFC PATCH v2 1/1] x86/sgx: Explicitly give up the CPU in EDMM's ioctl() to avoid softlockup

2024-04-26 Thread Dave Hansen
On 4/26/24 07:18, Bojun Zhu wrote:
>   for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
> + if (sgx_check_signal_and_resched()) {
> + if (!c)
> + ret = -ERESTARTSYS;
> +
> + goto out;
> + }

This construct is rather fugly.  Let's not perpetuate it, please.  Why
not do:

int ret = -ERESTARTSYS;

...
for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
if (sgx_check_signal_and_resched())
goto out;

Then, voila, when c==0 on the first run through the loop, you'll get a
ret=-ERESTARTSYS.

But honestly, it seems kinda silly to annotate all these loops with
explicit cond_resched()s.  I'd much rather do this once and, for
instance, just wrap the enclave locks:

- mutex_lock(>lock);
+ sgx_lock_enclave(encl);

and then have the lock function do the rescheds.  I assume that
mutex_lock() isn't doing this generically for performance reasons.  But
we don't care in SGX land and can just resched to our heart's content.




Re: [PATCH v8 1/4] dt-bindings: remoteproc: k3-m4f: Add K3 AM64x SoCs

2024-04-26 Thread Conor Dooley
On Thu, Apr 25, 2024 at 02:03:36PM -0500, Andrew Davis wrote:
> On 4/25/24 12:15 PM, Conor Dooley wrote:
> > On Wed, Apr 24, 2024 at 03:36:39PM -0500, Rob Herring wrote:
> > > 
> > > On Wed, 24 Apr 2024 14:06:09 -0500, Andrew Davis wrote:
> > > > From: Hari Nagalla 
> > > > 
> > > > K3 AM64x SoC has a Cortex M4F subsystem in the MCU voltage domain.
> > > > The remote processor's life cycle management and IPC mechanisms are
> > > > similar across the R5F and M4F cores from remote processor driver
> > > > point of view. However, there are subtle differences in image loading
> > > > and starting the M4F subsystems.
> > > > 
> > > > The YAML binding document provides the various node properties to be
> > > > configured by the consumers of the M4F subsystem.
> > > > 
> > > > Signed-off-by: Martyn Welch 
> > > > Signed-off-by: Hari Nagalla 
> > > > Signed-off-by: Andrew Davis 
> > > > ---
> > > >   .../bindings/remoteproc/ti,k3-m4f-rproc.yaml  | 126 ++
> > > >   1 file changed, 126 insertions(+)
> > > >   create mode 100644 
> > > > Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml
> > > > 
> > > 
> > > My bot found errors running 'make dt_binding_check' on your patch:
> > > 
> > > yamllint warnings/errors:
> > > 
> > > dtschema/dtc warnings/errors:
> > > 
> > > 
> > > doc reference errors (make refcheckdocs):
> > > Warning: 
> > > Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml 
> > > references a file that doesn't exist: 
> > > Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml
> > > Documentation/devicetree/bindings/remoteproc/ti,k3-m4f-rproc.yaml: 
> > > Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml
> > 
> > The file is now in dt-schema:
> > https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/reserved-memory/reserved-memory.yaml
> 
> So should I use "reserved-memory/reserved-memory.yaml" here, or just
> drop this line completely?

The only other example that I could find that didn't reference the text
binding (which points to dt-schema) said:
"(see reserved-memory/reserved-memory.yaml in dtschema project)"


signature.asc
Description: PGP signature


Re: tracing/probes: Fix a memory leak in traceprobe_parse_probe_arg_body()

2024-04-26 Thread Markus Elfring
…
> 
>  If traceprobe_parse_probe_arg_body() fails to allocate 'parg->fmt', it
>  jumps to 'out' instead of 'fail' by mistake. In the result, in this
>  case the 'tmp' buffer is not freed and leaks its memory.
>
>  Fix it by jumping to 'fail' in that case.
> 
…

How do you think about another approach for a more desirable change description?

   If traceprobe_parse_probe_arg_body() failed to allocate the object 
“parg->fmt”,
   it jumps to the label “out” instead of “fail” by mistake.
   In the result, the buffer “tmp” is not freed in this case and leaks its 
memory.

   Thus jump to the label “fail” in that case (so that the exception handling
   will be improved another bit).


Regards,
Markus



Re: [PATCH 0/2] Objpool performance improvements

2024-04-26 Thread Andrii Nakryiko
On Fri, Apr 26, 2024 at 7:25 AM Masami Hiramatsu  wrote:
>
> Hi Andrii,
>
> On Wed, 24 Apr 2024 14:52:12 -0700
> Andrii Nakryiko  wrote:
>
> > Improve objpool (used heavily in kretprobe hot path) performance with two
> > improvements:
> >   - inlining performance critical objpool_push()/objpool_pop() operations;
> >   - avoiding re-calculating relatively expensive nr_possible_cpus().
>
> Thanks for optimizing objpool. Both looks good to me.

Great, thanks for applying.

>
> BTW, I don't intend to stop this short-term optimization attempts,
> but I would like to ask you check the new fgraph based fprobe
> (kretprobe-multi)[1] instead of objpool/rethook.

You can see that I did :) There is tons of code and I'm not familiar
with internals of function_graph infra, but you can see I did run
benchmarks, so I'm paying attention.

But as for the objpool itself, I think it's a performant and useful
internal building block, and we might use it outside of rethook as
well, so I think making it as fast as possible is good regardless.

>
> [1] 
> https://lore.kernel.org/all/171318533841.254850.15841395205784342850.stgit@devnote2/
>
> I'm considering to obsolete the kretprobe (and rethook) with fprobe
> and eventually remove it. Those have similar feature and we should
> choose safer one.
>

Yep, I had a few more semi-ready patches, but I'll hold off for now
given this move to function graph, plus some of the changes that Jiri
is making in multi-kprobe code. I'll wait for things to settle down a
bit before looking again.

But just to give you some context, I'm an author of retsnoop tool, and
one of the killer features of it is LBR capture in kretprobes, which
is a tremendous help in investigating kernel failures, especially in
unfamiliar code (LBR allows to "look back" and figure out "how did we
get to this condition" after the fact). And so it's important to
minimize the amount of wasted LBR records between some kernel function
returns error (and thus is "an interesting event" and BPF program that
captures LBR is triggered). Big part of that is ftrace/fprobe/rethook
infra, so I was looking into making that part as "minimal" as
possible, in the sense of eliminating as many function calls and jump
as possible. This has an added benefit of making this hot path faster,
but my main motivation is LBR.

Anyways, just a bit of context for some of the other patches (like
inlining arch_rethook_trampoline_callback).

> Thank you,
>
> >
> > These opportunities were found when benchmarking and profiling kprobes and
> > kretprobes with BPF-based benchmarks. See individual patches for details and
> > results.
> >
> > Andrii Nakryiko (2):
> >   objpool: enable inlining objpool_push() and objpool_pop() operations
> >   objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
> >
> >  include/linux/objpool.h | 105 +++--
> >  lib/objpool.c   | 112 +++-
> >  2 files changed, 107 insertions(+), 110 deletions(-)
> >
> > --
> > 2.43.0
> >
>
>
> --
> Masami Hiramatsu (Google) 



Re: [PATCH] tracing/probes: Fix memory leak issues in traceprobe_parse_probe_arg_body()

2024-04-26 Thread Markus Elfring
…
> Therefore, the program should jump to the fail label instead of the out 
> label. This commit fixes this bug.
>
> Signed-off-by: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>

Please improve your patch attempt considerably.

See also:
https://kernelnewbies.org/FirstKernelPatch


Are there further change opportunities to take better into account?
https://elixir.bootlin.com/linux/v6.9-rc5/source/kernel/trace/trace_probe.c#L1403

Regards,
Markus



Re: [PATCH] kernel/trace/trace_probe:Fixed memory leak issues in trace_probe.c.

2024-04-26 Thread Google
Hi LuMingYin,

Thanks for finding the problem! But please make a commit message
following Documentation/process/submitting-patches.rst


On Fri, 26 Apr 2024 10:13:43 +0100
lumingyindet...@126.com wrote:

> From: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>
> 
> At line 1408 of the file /linux/kernel/trace/trace_probe.c, pointer variables 
> named code and tmp are defined. At line 1437, a new dynamic memory area is 
> allocated using the function kcalloc. When the if statement at line 1467 
> evaluates to true, the program jumps to the out label at line 1469. Within 
> this function, there are two labels: out and fail. The difference between 
> these two labels is that fail additionally frees the dynamic memory area 
> pointed to by the variable code. Therefore, the program should jump to the 
> fail label instead of the out label. This commit fixes this bug.
> 

For example, you must line break after about 70 characters. Also,
please don't use the line number because the line number is easily
changed (function name is OK). Since this bug is very clear mistake,
so you can just explain that as following.


 If traceprobe_parse_probe_arg_body() fails to allocate 'parg->fmt', it
 jumps to 'out' instead of 'fail' by mistake. In the result, in this
 case the 'tmp' buffer is not freed and leaks its memory.

 Fix it by jumping to 'fail' in that case.

The first paragraph explains what happens, and second one to exaplain
how to fix it.

Also, please add this Fixes tag.

Fixes: 032330abd08b ("tracing/probes: Cleanup probe argument parser")

You can easily find this commit number with git blame.

Thank you,

> Signed-off-by: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>
> ---
>  kernel/trace/trace_probe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index dfe3ee6035ec..42bc0f362226 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -1466,7 +1466,7 @@ static int traceprobe_parse_probe_arg_body(const char 
> *argv, ssize_t *size,
>   parg->fmt = kmalloc(len, GFP_KERNEL);
>   if (!parg->fmt) {
>   ret = -ENOMEM;
> - goto out;
> + goto fail;
>   }
>   snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
>parg->count);
> -- 
> 2.25.1
> 


-- 
Masami Hiramatsu (Google) 



[PATCH v3 0/2] virtiofs: fix the warning for kernel direct IO

2024-04-26 Thread Hou Tao
From: Hou Tao 

Hi,

The patch set aims to fix the warning related to an abnormal size
parameter of kmalloc() in virtiofs. Patch #1 fixes it by introducing
use_pages_for_kvec_io option in fuse_conn and enabling it in virtiofs.
Beside the abnormal size parameter for kmalloc, the gfp parameter is
also questionable: GFP_ATOMIC is used even when the allocation occurs
in a kworker context. Patch #2 fixes it by using GFP_NOFS when the
allocation is initiated by the kworker. For more details, please check
the individual patches.

As usual, comments are always welcome.

Change Log:

v3:
 * introduce use_pages_for_kvec_io for virtiofs. When the option is
   enabled, fuse will use iov_iter_extract_pages() to construct a page
   array and pass the pages array instead of a pointer to virtiofs.
   The benefit is twofold: the length of the data passed to virtiofs is
   limited by max_pages, and there is no memory copy compared with v2.

v2: 
https://lore.kernel.org/linux-fsdevel/20240228144126.2864064-1-hou...@huaweicloud.com/
  * limit the length of ITER_KVEC dio by max_pages instead of the
newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
dio being consistent with other rw operations.
  * replace kmalloc-allocated bounce buffer by using a bounce buffer
backed by scattered pages when the length of the bounce buffer for
KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
fragmented memory, the KVEC_ITER dio can be handled normally by
virtiofs. (Bernd Schubert)
  * merge the GFP_NOFS patch [1] into this patch-set and use
memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
(Benjamin Coddington)

v1: 
https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/

[1]: 
https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/

Hou Tao (2):
  virtiofs: use pages instead of pointer for kernel direct IO
  virtiofs: use GFP_NOFS when enqueuing request through kworker

 fs/fuse/file.c  | 12 
 fs/fuse/fuse_i.h|  3 +++
 fs/fuse/virtio_fs.c | 25 -
 3 files changed, 27 insertions(+), 13 deletions(-)

-- 
2.29.2




[PATCH v3 1/2] virtiofs: use pages instead of pointer for kernel direct IO

2024-04-26 Thread Hou Tao
From: Hou Tao 

When trying to insert a 10MB kernel module kept in a virtio-fs with cache
disabled, the following warning was reported:

  [ cut here ]
  WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ..
  Modules linked in:
  CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ..
  RIP: 0010:__alloc_pages+0x2bf/0x380
  ..
  Call Trace:
   
   ? __warn+0x8e/0x150
   ? __alloc_pages+0x2bf/0x380
   __kmalloc_large_node+0x86/0x160
   __kmalloc+0x33c/0x480
   virtio_fs_enqueue_req+0x240/0x6d0
   virtio_fs_wake_pending_and_unlock+0x7f/0x190
   queue_request_and_unlock+0x55/0x60
   fuse_simple_request+0x152/0x2b0
   fuse_direct_io+0x5d2/0x8c0
   fuse_file_read_iter+0x121/0x160
   __kernel_read+0x151/0x2d0
   kernel_read+0x45/0x50
   kernel_read_file+0x1a9/0x2a0
   init_module_from_file+0x6a/0xe0
   idempotent_init_module+0x175/0x230
   __x64_sys_finit_module+0x5d/0xb0
   x64_sys_call+0x1c3/0x9e0
   do_syscall_64+0x3d/0xc0
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   ..
   
  ---[ end trace  ]---

The warning is triggered as follows:

1) syscall finit_module() handles the module insertion and it invokes
kernel_read_file() to read the content of the module first.

2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and
passes it to kernel_read(). kernel_read() constructs a kvec iter by
using iov_iter_kvec() and passes it to fuse_file_read_iter().

3) virtio-fs disables the cache, so fuse_file_read_iter() invokes
fuse_direct_io(). As for now, the maximal read size for kvec iter is
only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so
fuse_direct_io() doesn't split the 10MB buffer. It saves the address and
the size of the 10MB-sized buffer in out_args[0] of a fuse request and
passes the fuse request to virtio_fs_wake_pending_and_unlock().

4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to
queue the request. Because virtiofs need DMA-able address, so
virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for
all fuse args, copies these args into the bounce buffer and passed the
physical address of the bounce buffer to virtiofsd. The total length of
these fuse args for the passed fuse request is about 10MB, so
copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and
it triggers the warning in __alloc_pages():

if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
return NULL;

5) virtio_fs_enqueue_req() will retry the memory allocation in a
kworker, but it won't help, because kmalloc() will always return NULL
due to the abnormal size and finit_module() will hang forever.

A feasible solution is to limit the value of max_read for virtio-fs, so
the length passed to kmalloc() will be limited. However it will affect
the maximal read size for normal read. And for virtio-fs write initiated
from kernel, it has the similar problem but now there is no way to limit
fc->max_write in kernel.

So instead of limiting both the values of max_read and max_write in
kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as
true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use
pages instead of pointer to pass the KVEC_IO data.

Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
Signed-off-by: Hou Tao 
---
 fs/fuse/file.c  | 12 
 fs/fuse/fuse_i.h|  3 +++
 fs/fuse/virtio_fs.c |  1 +
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b57ce41576407..82b77c5d8c643 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1471,13 +1471,17 @@ static inline size_t fuse_get_frag_size(const struct 
iov_iter *ii,
 
 static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
   size_t *nbytesp, int write,
-  unsigned int max_pages)
+  unsigned int max_pages,
+  bool use_pages_for_kvec_io)
 {
size_t nbytes = 0;  /* # bytes already packed in req */
ssize_t ret = 0;
 
-   /* Special case for kernel I/O: can copy directly into the buffer */
-   if (iov_iter_is_kvec(ii)) {
+   /* Special case for kernel I/O: can copy directly into the buffer.
+* However if the implementation of fuse_conn requires pages instead of
+* pointer (e.g., virtio-fs), use iov_iter_extract_pages() instead.
+*/
+   if (iov_iter_is_kvec(ii) && !use_pages_for_kvec_io) {
unsigned long user_addr = fuse_get_user_addr(ii);
size_t frag_size = fuse_get_frag_size(ii, *nbytesp);
 
@@ -1585,7 +1589,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct 
iov_iter *iter,
size_t nbytes = min(count, nmax);
 
err = fuse_get_user_pages(>ap, iter, , write,
- max_pages);
+ 

[PATCH v3 2/2] virtiofs: use GFP_NOFS when enqueuing request through kworker

2024-04-26 Thread Hou Tao
From: Hou Tao 

When invoking virtio_fs_enqueue_req() through kworker, both the
allocation of the sg array and the bounce buffer still use GFP_ATOMIC.
Considering the size of the sg array may be greater than PAGE_SIZE, use
GFP_NOFS instead of GFP_ATOMIC to lower the possibility of memory
allocation failure and to avoid unnecessarily depleting the atomic
reserves. GFP_NOFS is not passed to virtio_fs_enqueue_req() directly,
GFP_KERNEL and memalloc_nofs_{save|restore} helpers are used instead.

Signed-off-by: Hou Tao 
---
 fs/fuse/virtio_fs.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 36984c0e23d14..096b589ed2fcc 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -91,7 +91,8 @@ struct virtio_fs_req_work {
 };
 
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
-struct fuse_req *req, bool in_flight);
+struct fuse_req *req, bool in_flight,
+gfp_t gfp);
 
 static const struct constant_table dax_param_enums[] = {
{"always",  FUSE_DAX_ALWAYS },
@@ -430,6 +431,8 @@ static void virtio_fs_request_dispatch_work(struct 
work_struct *work)
 
/* Dispatch pending requests */
while (1) {
+   unsigned int flags;
+
spin_lock(>lock);
req = list_first_entry_or_null(>queued_reqs,
   struct fuse_req, list);
@@ -440,7 +443,9 @@ static void virtio_fs_request_dispatch_work(struct 
work_struct *work)
list_del_init(>list);
spin_unlock(>lock);
 
-   ret = virtio_fs_enqueue_req(fsvq, req, true);
+   flags = memalloc_nofs_save();
+   ret = virtio_fs_enqueue_req(fsvq, req, true, GFP_KERNEL);
+   memalloc_nofs_restore(flags);
if (ret < 0) {
if (ret == -ENOMEM || ret == -ENOSPC) {
spin_lock(>lock);
@@ -545,7 +550,7 @@ static void virtio_fs_hiprio_dispatch_work(struct 
work_struct *work)
 }
 
 /* Allocate and copy args into req->argbuf */
-static int copy_args_to_argbuf(struct fuse_req *req)
+static int copy_args_to_argbuf(struct fuse_req *req, gfp_t gfp)
 {
struct fuse_args *args = req->args;
unsigned int offset = 0;
@@ -559,7 +564,7 @@ static int copy_args_to_argbuf(struct fuse_req *req)
len = fuse_len_args(num_in, (struct fuse_arg *) args->in_args) +
  fuse_len_args(num_out, args->out_args);
 
-   req->argbuf = kmalloc(len, GFP_ATOMIC);
+   req->argbuf = kmalloc(len, gfp);
if (!req->argbuf)
return -ENOMEM;
 
@@ -1183,7 +1188,8 @@ static unsigned int sg_init_fuse_args(struct scatterlist 
*sg,
 
 /* Add a request to a virtqueue and kick the device */
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
-struct fuse_req *req, bool in_flight)
+struct fuse_req *req, bool in_flight,
+gfp_t gfp)
 {
/* requests need at least 4 elements */
struct scatterlist *stack_sgs[6];
@@ -1204,8 +1210,8 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq 
*fsvq,
/* Does the sglist fit on the stack? */
total_sgs = sg_count_fuse_req(req);
if (total_sgs > ARRAY_SIZE(stack_sgs)) {
-   sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), GFP_ATOMIC);
-   sg = kmalloc_array(total_sgs, sizeof(sg[0]), GFP_ATOMIC);
+   sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), gfp);
+   sg = kmalloc_array(total_sgs, sizeof(sg[0]), gfp);
if (!sgs || !sg) {
ret = -ENOMEM;
goto out;
@@ -1213,7 +1219,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq 
*fsvq,
}
 
/* Use a bounce buffer since stack args cannot be mapped */
-   ret = copy_args_to_argbuf(req);
+   ret = copy_args_to_argbuf(req, gfp);
if (ret < 0)
goto out;
 
@@ -1309,7 +1315,7 @@ __releases(fiq->lock)
 fuse_len_args(req->args->out_numargs, req->args->out_args));
 
fsvq = >vqs[queue_id];
-   ret = virtio_fs_enqueue_req(fsvq, req, false);
+   ret = virtio_fs_enqueue_req(fsvq, req, false, GFP_ATOMIC);
if (ret < 0) {
if (ret == -ENOMEM || ret == -ENOSPC) {
/*
-- 
2.29.2




Re: [PATCH v12 14/14] selftests/sgx: Add scripts for EPC cgroup testing

2024-04-26 Thread Dave Hansen
On 4/16/24 07:15, Jarkko Sakkinen wrote:
> On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote:
> Yes, exactly. I'd take one week break and cycle the kselftest part
> internally a bit as I said my previous response. I'm sure that there
> is experise inside Intel how to implement it properly. I.e. take some
> time to find the right person, and wait as long as that person has a
> bit of bandwidth to go through the test and suggest modifications.

Folks, I worry that this series is getting bogged down in the selftests.
 Yes, selftests are important.  But getting _some_ tests in the kernel
is substantially more important than getting perfect tests.

I don't think Haitao needs to "cycle" this back inside Intel.



Re: [PATCH 0/2] Objpool performance improvements

2024-04-26 Thread Google
Hi Andrii,

On Wed, 24 Apr 2024 14:52:12 -0700
Andrii Nakryiko  wrote:

> Improve objpool (used heavily in kretprobe hot path) performance with two
> improvements:
>   - inlining performance critical objpool_push()/objpool_pop() operations;
>   - avoiding re-calculating relatively expensive nr_possible_cpus().

Thanks for optimizing objpool. Both looks good to me.

BTW, I don't intend to stop this short-term optimization attempts,
but I would like to ask you check the new fgraph based fprobe 
(kretprobe-multi)[1] instead of objpool/rethook.

[1] 
https://lore.kernel.org/all/171318533841.254850.15841395205784342850.stgit@devnote2/

I'm considering to obsolete the kretprobe (and rethook) with fprobe
and eventually remove it. Those have similar feature and we should
choose safer one.

Thank you,

> 
> These opportunities were found when benchmarking and profiling kprobes and
> kretprobes with BPF-based benchmarks. See individual patches for details and
> results.
> 
> Andrii Nakryiko (2):
>   objpool: enable inlining objpool_push() and objpool_pop() operations
>   objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
> 
>  include/linux/objpool.h | 105 +++--
>  lib/objpool.c   | 112 +++-
>  2 files changed, 107 insertions(+), 110 deletions(-)
> 
> -- 
> 2.43.0
> 


-- 
Masami Hiramatsu (Google) 



[RFC PATCH v2 1/1] x86/sgx: Explicitly give up the CPU in EDMM's ioctl() to avoid softlockup

2024-04-26 Thread Bojun Zhu
EDMM's ioctl()s support batch operations, which may be
time-consuming. Try to explicitly give up the CPU as the prefix
operation at the every begin of "for loop" in
sgx_enclave_{ modify_types | restrict_permissions | remove_pages}
to give other tasks a chance to run, and avoid softlockup warning.

Additionally perform pending signals check as the prefix operation,
and introduce sgx_check_signal_and_resched(),
which wraps all the checks.

The following has been observed on Linux v6.9-rc5 with kernel
preemptions disabled(by configuring "PREEMPT_NONE=y"), when kernel
is requested to restrict page permissions of a large number of EPC pages.

[ cut here ]
watchdog: BUG: soft lockup - CPU#45 stuck for 22s!
...
RIP: 0010:sgx_enclave_restrict_permissions+0xba/0x1f0
...
Call Trace:
 sgx_ioctl
 __x64_sys_ioctl
 x64_sys_call
 do_syscall_64
 entry_SYSCALL_64_after_hwframe
[ end trace ]

Signed-off-by: Bojun Zhu 
---
 arch/x86/kernel/cpu/sgx/ioctl.c | 40 +
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index b65ab214bdf5..e0645920bcb5 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -365,6 +365,20 @@ static int sgx_validate_offset_length(struct sgx_encl 
*encl,
return 0;
 }
 
+/*
+ * Check signals and invoke scheduler. Return true for a pending signal.
+ */
+static bool sgx_check_signal_and_resched(void)
+{
+   if (signal_pending(current))
+   return true;
+
+   if (need_resched())
+   cond_resched();
+
+   return false;
+}
+
 /**
  * sgx_ioc_enclave_add_pages() - The handler for %SGX_IOC_ENCLAVE_ADD_PAGES
  * @encl:   an enclave pointer
@@ -432,16 +446,13 @@ static long sgx_ioc_enclave_add_pages(struct sgx_encl 
*encl, void __user *arg)
return -EINVAL;
 
for (c = 0 ; c < add_arg.length; c += PAGE_SIZE) {
-   if (signal_pending(current)) {
+   if (sgx_check_signal_and_resched()) {
if (!c)
ret = -ERESTARTSYS;
 
break;
}
 
-   if (need_resched())
-   cond_resched();
-
ret = sgx_encl_add_page(encl, add_arg.src + c, add_arg.offset + 
c,
, add_arg.flags);
if (ret)
@@ -746,6 +757,13 @@ sgx_enclave_restrict_permissions(struct sgx_encl *encl,
secinfo.flags = modp->permissions & SGX_SECINFO_PERMISSION_MASK;
 
for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
+   if (sgx_check_signal_and_resched()) {
+   if (!c)
+   ret = -ERESTARTSYS;
+
+   goto out;
+   }
+
addr = encl->base + modp->offset + c;
 
sgx_reclaim_direct();
@@ -913,6 +931,13 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl,
secinfo.flags = page_type << 8;
 
for (c = 0 ; c < modt->length; c += PAGE_SIZE) {
+   if (sgx_check_signal_and_resched()) {
+   if (!c)
+   ret = -ERESTARTSYS;
+
+   goto out;
+   }
+
addr = encl->base + modt->offset + c;
 
sgx_reclaim_direct();
@@ -1101,6 +1126,13 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
secinfo.flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X;
 
for (c = 0 ; c < params->length; c += PAGE_SIZE) {
+   if (sgx_check_signal_and_resched()) {
+   if (!c)
+   ret = -ERESTARTSYS;
+
+   goto out;
+   }
+
addr = encl->base + params->offset + c;
 
sgx_reclaim_direct();
-- 
2.25.1




[RFC PATCH v2 0/1] x86/sgx: Explicitly give up the CPU in EDMM's ioctl() to avoid softlockup

2024-04-26 Thread Bojun Zhu
Hi forks,

This is the second version of the patch to fix the softlockup in EDMM 
iotcl()[1].

If we run an enclave equipped with large EPC(30G or greater on my platfrom)
on the Linux with kernel preemptions disabled(by configuring
"CONFIG_PREEMPT_NONE=y"), we will get the following softlockup warning 
messages being reported in "dmesg" log:

The EDMM's ioctl()s (sgx_ioc_enclave_{ modify_types | restrict_permissions |  
remove_pages}) 
interface provided by kernel support batch changing attributes of enclave's EPC.
If userspace App requests kernel to handle too many EPC pages, kernel
may stuck for a long time(with preemption disabled).

The log is as follows:

[ cut here ]
[  901.101294] watchdog: BUG: soft lockup - CPU#92 stuck for 23s! 
[occlum-run:4289]
[  901.109617] Modules linked in: veth xt_conntrack xt_MASQUERADE 
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype iptable_filter 
br_netfilter bridge stp llc overlay nls_iso8859_1 intel_rapl_msr 
intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common 
i10nm_edac nfit binfmt_misc ipmi_ssif x86_pkg_temp_thermal intel_powerclamp 
coretemp kvm_intel kvm crct10dif_pclmul polyval_clmulni polyval_generic 
ghash_clmulni_intel sha512_ssse3 sha256_ssse3 pmt_telemetry sha1_ssse3 
pmt_class joydev intel_sdsi input_leds aesni_intel crypto_simd cryptd dax_hmem 
cxl_acpi cmdlinepart rapl cxl_core ast spi_nor intel_cstate drm_shmem_helper 
einj mtd drm_kms_helper mei_me idxd isst_if_mmio isst_if_mbox_pci 
isst_if_common intel_vsec idxd_bus mei acpi_ipmi ipmi_si ipmi_devintf 
ipmi_msghandler acpi_pad acpi_power_meter mac_hid sch_fq_codel msr parport_pc 
ppdev lp parport ramoops reed_solomon pstore_blk pstore_zone efi_pstore drm 
ip_tables x_tables
[  901.109670]  autofs4 mlx5_ib ib_uverbs ib_core hid_generic usbhid hid ses 
enclosure scsi_transport_sas mlx5_core pci_hyperv_intf mlxfw igb ahci psample 
i2c_algo_bit i2c_i801 spi_intel_pci xhci_pci tls megaraid_sas dca spi_intel 
crc32_pclmul i2c_smbus i2c_ismt libahci xhci_pci_renesas wmi pinctrl_emmitsburg
[  901.109691] CPU: 92 PID: 4289 Comm: occlum-run Not tainted 6.9.0-rc5 #3
[  901.109693] Hardware name: Inspur NF5468-M7-A0-R0-00/NF5468-M7-A0-R0-00, 
BIOS 05.02.01 05/08/2023
[  901.109695] RIP: 0010:sgx_enclave_restrict_permissions+0xba/0x1f0
[  901.109701] Code: 48 c1 e6 05 48 89 d1 48 8d 5c 24 40 b8 0e 00 00 00 48 2b 
8e 70 8e 15 8b 48 c1 e9 05 48 c1 e1 0c 48 03 8e 68 8e 15 8b 0f 01 cf  00 00 
00 40 0f 85 b2 00 00 00 85 c0 0f 85 db 00 00 00 4c 89 ef
[  901.109702] RSP: 0018:ad0ae5d0f8c0 EFLAGS: 0202
[  901.109704] RAX:  RBX: ad0ae5d0f900 RCX: ad11dfc0e000
[  901.109705] RDX: ad2adcff81c0 RSI:  RDI: 9a12f5f4f000
[  901.109706] RBP: ad0ae5d0f9b0 R08: 0002 R09: 9a1289f57520
[  901.109707] R10: 005d R11: 0002 R12: 0006d8ff2000
[  901.109708] R13: 9a12f5f4f000 R14: ad0ae5d0fa18 R15: 9a12f5f4f020
[  901.109709] FS:  7fb20ad1d740() GS:9a317fe0() 
knlGS:
[  901.109710] CS:  0010 DS:  ES:  CR0: 80050033
[  901.109711] CR2: 7f8041811000 CR3: 000118530006 CR4: 00770ef0
[  901.109712] DR0:  DR1:  DR2: 
[  901.109713] DR3:  DR6: fffe07f0 DR7: 0400
[  901.109714] PKRU: 5554
[  901.109714] Call Trace:
[  901.109716]  
[  901.109718]  ? show_regs+0x67/0x70
[  901.109722]  ? watchdog_timer_fn+0x1f3/0x280
[  901.109725]  ? __pfx_watchdog_timer_fn+0x10/0x10
[  901.109727]  ? __hrtimer_run_queues+0xc8/0x220
[  901.109731]  ? hrtimer_interrupt+0x10c/0x250
[  901.109733]  ? __sysvec_apic_timer_interrupt+0x53/0x130
[  901.109736]  ? sysvec_apic_timer_interrupt+0x7b/0x90
[  901.109739]  
[  901.109740]  
[  901.109740]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  901.109745]  ? sgx_enclave_restrict_permissions+0xba/0x1f0
[  901.109747]  ? aa_file_perm+0x145/0x550
[  901.109750]  sgx_ioctl+0x1ab/0x900
[  901.109751]  ? xas_find+0x84/0x200
[  901.109754]  ? sgx_enclave_etrack+0xbb/0x140
[  901.109756]  ? sgx_encl_may_map+0x19a/0x240
[  901.109758]  ? common_file_perm+0x8a/0x1b0
[  901.109760]  ? obj_cgroup_charge_pages+0xa2/0x100
[  901.109763]  ? tlb_flush_mmu+0x31/0x1c0
[  901.109766]  ? tlb_finish_mmu+0x42/0x80
[  901.109767]  ? do_mprotect_pkey+0x150/0x530
[  901.109769]  ? __fget_light+0xc0/0x100
[  901.109772]  __x64_sys_ioctl+0x95/0xd0
[  901.109775]  x64_sys_call+0x1209/0x20c0
[  901.109777]  do_syscall_64+0x6d/0x110
[  901.109779]  ? syscall_exit_to_user_mode+0x86/0x1c0
[  901.109782]  ? do_syscall_64+0x79/0x110
[  901.109783]  ? syscall_exit_to_user_mode+0x86/0x1c0
[  901.109784]  ? do_syscall_64+0x79/0x110
[  901.109785]  ? free_unref_page+0x10e/0x180
[  901.109788]  ? __do_fault+0x36/0x130
[  901.109791]  ? 

Re: [PATCH v7 0/6] soc: qcom: add in-kernel pd-mapper implementation

2024-04-26 Thread Alexey Minnekhanov




On 24.04.2024 12:27, Dmitry Baryshkov wrote:

Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

Unlike previous revisions of the patchset, this iteration uses static
configuration per platform, rather than building it dynamically from the
list of DSPs being started.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org
Cc: Johan Hovold 
Cc: Xilin Wu 
Cc: "Bryan O'Donoghue" 
--

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
   builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
   silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
   (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
   them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404



I've tested this series on sdm660 device, with userspace pd-mapper
service disabled, and don't see any regressions - e.g Wi-Fi/BT
still come online and work as before.

Debug logs:
https://paste.sr.ht/~minlexx/bd03db4c582a3275078ce4fd05ea76ce46a52b8e

Missing cdsp_root and adsp_sensors PDs are not currently an issue,
because those are not enabled yet on SDM660 or hard to test, so

Tested-by: Alexey Minnekhanov 

--
Regards,
Alexey Minnekhanov
postmarketOS developer



Re: [PATCH v7 5/6] soc: qcom: add pd-mapper implementation

2024-04-26 Thread Alexey Minnekhanov

On 24.04.2024 12:28, Dmitry Baryshkov wrote:

Existing userspace protection domain mapper implementation has several
issue. It doesn't play well with CONFIG_EXTRA_FIRMWARE, it doesn't
reread JSON files if firmware location is changed (or if firmware was
not available at the time pd-mapper was started but the corresponding
directory is mounted later), etc.

Provide in-kernel service implementing protection domain mapping
required to work with several services, which are provided by the DSP
firmware.

This module is loaded automatically by the remoteproc drivers when
necessary via the symbol dependency. It uses a root node to match a
protection domains map for a particular board. It is not possible to
implement it as a 'driver' as there is no corresponding device.

Signed-off-by: Dmitry Baryshkov 




diff --git a/drivers/soc/qcom/qcom_pd_mapper.c 
b/drivers/soc/qcom/qcom_pd_mapper.c

...

+
+static const struct qcom_pdm_domain_data *sdm660_domains[] = {
+   _audio_pd,
+   _root_pd,
+   _wlan_pd,
+   NULL,
+};
+


On my SDM660 device (xiaomi-lavender) I see also the following files on 
modem partition:

 - adsps.jsn with sensor_pd (instance id 74)
 - cdspr.jsn with cdsp root_pd (instance id 76)
 - modemr.jsn with root_pd (instance id 180)

I see these numbers match those you already have defined above, so 
perhaps sdm660_domains should also have adsp_sensor_pd, cdsp_root_pd, 
and mpss_root_pd?


I'm not sure how useful they are currently, as AFAIK cdsp is not added 
to sdm660 DT at all; and ADSP sensors are very hard to use/test, needs 
very special userspace...


--
Regards,
Alexey Minnekhanov
postmarketOS developer



Re: [PATCH net-next v9 0/7] Implement reset reason mechanism to detect

2024-04-26 Thread patchwork-bot+netdevbpf
Hello:

This series was applied to netdev/net-next.git (main)
by Paolo Abeni :

On Thu, 25 Apr 2024 11:13:33 +0800 you wrote:
> From: Jason Xing 
> 
> In production, there are so many cases about why the RST skb is sent but
> we don't have a very convenient/fast method to detect the exact underlying
> reasons.
> 
> RST is implemented in two kinds: passive kind (like tcp_v4_send_reset())
> and active kind (like tcp_send_active_reset()). The former can be traced
> carefully 1) in TCP, with the help of drop reasons, which is based on
> Eric's idea[1], 2) in MPTCP, with the help of reset options defined in
> RFC 8684. The latter is relatively independent, which should be
> implemented on our own, such as active reset reasons which can not be
> replace by skb drop reason or something like this.
> 
> [...]

Here is the summary with links:
  - [net-next,v9,1/7] net: introduce rstreason to detect why the RST is sent
https://git.kernel.org/netdev/net-next/c/5cb2cb3cb20c
  - [net-next,v9,2/7] rstreason: prepare for passive reset
https://git.kernel.org/netdev/net-next/c/6be49deaa095
  - [net-next,v9,3/7] rstreason: prepare for active reset
https://git.kernel.org/netdev/net-next/c/5691276b39da
  - [net-next,v9,4/7] tcp: support rstreason for passive reset
https://git.kernel.org/netdev/net-next/c/120391ef9ca8
  - [net-next,v9,5/7] mptcp: support rstreason for passive reset
https://git.kernel.org/netdev/net-next/c/3e140491dd80
  - [net-next,v9,6/7] mptcp: introducing a helper into active reset logic
https://git.kernel.org/netdev/net-next/c/215d40248bde
  - [net-next,v9,7/7] rstreason: make it work in trace world
https://git.kernel.org/netdev/net-next/c/b533fb9cf4f7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html





[PATCH] kernel/trace/trace_probe:Fixed memory leak issues in trace_probe.c.

2024-04-26 Thread lumingyindetect
From: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>

At line 1408 of the file /linux/kernel/trace/trace_probe.c, pointer variables 
named code and tmp are defined. At line 1437, a new dynamic memory area is 
allocated using the function kcalloc. When the if statement at line 1467 
evaluates to true, the program jumps to the out label at line 1469. Within this 
function, there are two labels: out and fail. The difference between these two 
labels is that fail additionally frees the dynamic memory area pointed to by 
the variable code. Therefore, the program should jump to the fail label instead 
of the out label. This commit fixes this bug.

Signed-off-by: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>
---
 kernel/trace/trace_probe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index dfe3ee6035ec..42bc0f362226 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1466,7 +1466,7 @@ static int traceprobe_parse_probe_arg_body(const char 
*argv, ssize_t *size,
parg->fmt = kmalloc(len, GFP_KERNEL);
if (!parg->fmt) {
ret = -ENOMEM;
-   goto out;
+   goto fail;
}
snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
 parg->count);
-- 
2.25.1




Re: [PATCH net-next v7] net/ipv4: add tracepoint for icmp_send

2024-04-26 Thread Eric Dumazet
On Fri, Apr 26, 2024 at 10:47 AM  wrote:
>
> From: Peilin He 
>
> Introduce a tracepoint for icmp_send, which can help users to get more
> detail information conveniently when icmp abnormal events happen.
>
> 1. Giving an usecase example:
> =
> When an application experiences packet loss due to an unreachable UDP
> destination port, the kernel will send an exception message through the
> icmp_send function. By adding a trace point for icmp_send, developers or
> system administrators can obtain detailed information about the UDP
> packet loss, including the type, code, source address, destination address,
> source port, and destination port. This facilitates the trouble-shooting
> of UDP packet loss issues especially for those network-service
> applications.
>
> 2. Operation Instructions:
> ==
> Switch to the tracing directory.
> cd /sys/kernel/tracing
> Filter for destination port unreachable.
> echo "type==3 && code==3" > events/icmp/icmp_send/filter
> Enable trace event.
> echo 1 > events/icmp/icmp_send/enable
>
> 3. Result View:
> 
>  udp_client_erro-11370   [002] ...s.12   124.728002:
>  icmp_send: icmp_send: type=3, code=3.
>  From 127.0.0.1:41895 to 127.0.0.1: ulen=23
>  skbaddr=589b167a
>
> Change log:
> ===
> v6->v7:
> Some fixes according to
> https://lore.kernel.org/all/20240425081210.720a4...@kernel.org/
> 1. Fix patch format issues.
>
> v5->v6:
> Some fixes according to
> https://lore.kernel.org/all/20240413161319.ga853...@kernel.org/
> 1.Resubmit patches based on the latest net-next code.
>
> v4->v5:
> Some fixes according to
> https://lore.kernel.org/all/CAL+tcoDeXXh+zcRk4PHnUk8ELnx=CE2pc
> cqs7sfm0y9ak-e...@mail.gmail.com/
> 1.Adjust the position of trace_icmp_send() to before icmp_push_reply().
>
> v3->v4:
> Some fixes according to
> https://lore.kernel.org/all/CANn89i+EFEr7VHXNdOi59Ba_R1nFKSBJz
> BzkJFVgCTdXBx=y...@mail.gmail.com/
> 1.Add legality check for UDP header in SKB.
> 2.Target this patch for net-next.
>
> v2->v3:
> Some fixes according to
> https://lore.kernel.org/all/20240319102549.7f7f6...@gandalf.local.home/
> 1. Change the tracking directory to/sys/kernel/tracking.
> 2. Adjust the layout of the TP-STRUCT_entry parameter structure.
>
> v1->v2:
> Some fixes according to
> https://lore.kernel.org/all/CANn89iL-y9e_VFpdw=sZtRnKRu_tnUwqHu
> fqtjvjsv-nz1x...@mail.gmail.com/
> 1. adjust the trace_icmp_send() to more protocols than UDP.
> 2. move the calling of trace_icmp_send after sanity checks
> in __icmp_send().
>
> Signed-off-by: Peilin He 
> Signed-off-by: xu xin 
> Reviewed-by: Yunkai Zhang 
> Cc: Yang Yang 
> Cc: Liu Chun 
> Cc: Xuexin Jiang 
> ---
>  include/trace/events/icmp.h | 68 +
>  net/ipv4/icmp.c |  4 +++
>  2 files changed, 72 insertions(+)
>  create mode 100644 include/trace/events/icmp.h
>
> diff --git a/include/trace/events/icmp.h b/include/trace/events/icmp.h
> new file mode 100644
> index ..cff33aaee44e
> --- /dev/null
> +++ b/include/trace/events/icmp.h
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM icmp
> +
> +#if !defined(_TRACE_ICMP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_ICMP_H
> +
> +#include 
> +#include 
> +
> +TRACE_EVENT(icmp_send,
> +
> +   TP_PROTO(const struct sk_buff *skb, int type, int code),
> +
> +   TP_ARGS(skb, type, code),
> +
> +   TP_STRUCT__entry(
> +   __field(const void *, skbaddr)
> +   __field(int, type)
> +   __field(int, code)
> +   __array(__u8, saddr, 4)
> +   __array(__u8, daddr, 4)
> +   __field(__u16, sport)
> +   __field(__u16, dport)
> +   __field(unsigned short, ulen)
> +   ),
> +
> +   TP_fast_assign(
> +   struct iphdr *iph = ip_hdr(skb);
> +   int proto_4 = iph->protocol;
> +   __be32 *p32;
> +
> +   __entry->skbaddr = skb;
> +   __entry->type = type;
> +   __entry->code = code;
> +
> +   struct udphdr *uh = udp_hdr(skb);

Please move this line up., perhaps after the "struct iphdr *iph =
ip_hdr(skb);" one

We group all variable definitions together, we do not mix code and variables.



> +
> +   if (proto_4 != IPPROTO_UDP || (u8 *)uh < skb->head ||
> +   (u8 *)uh + sizeof(struct udphdr)
> +   > skb_tail_pointer(skb)) {
> +   __entry->sport = 0;
> +   __entry->dport = 0;
> +   __entry->ulen = 0;
> +   } else {
> +   

[PATCH net-next v7] net/ipv4: add tracepoint for icmp_send

2024-04-26 Thread xu.xin16
From: Peilin He 

Introduce a tracepoint for icmp_send, which can help users to get more
detail information conveniently when icmp abnormal events happen.

1. Giving an usecase example:
=
When an application experiences packet loss due to an unreachable UDP
destination port, the kernel will send an exception message through the
icmp_send function. By adding a trace point for icmp_send, developers or
system administrators can obtain detailed information about the UDP
packet loss, including the type, code, source address, destination address,
source port, and destination port. This facilitates the trouble-shooting
of UDP packet loss issues especially for those network-service
applications.

2. Operation Instructions:
==
Switch to the tracing directory.
cd /sys/kernel/tracing
Filter for destination port unreachable.
echo "type==3 && code==3" > events/icmp/icmp_send/filter
Enable trace event.
echo 1 > events/icmp/icmp_send/enable

3. Result View:

 udp_client_erro-11370   [002] ...s.12   124.728002:
 icmp_send: icmp_send: type=3, code=3.
 From 127.0.0.1:41895 to 127.0.0.1: ulen=23
 skbaddr=589b167a

Change log:
===
v6->v7:
Some fixes according to
https://lore.kernel.org/all/20240425081210.720a4...@kernel.org/
1. Fix patch format issues.

v5->v6:
Some fixes according to
https://lore.kernel.org/all/20240413161319.ga853...@kernel.org/
1.Resubmit patches based on the latest net-next code.

v4->v5:
Some fixes according to
https://lore.kernel.org/all/CAL+tcoDeXXh+zcRk4PHnUk8ELnx=CE2pc
cqs7sfm0y9ak-e...@mail.gmail.com/
1.Adjust the position of trace_icmp_send() to before icmp_push_reply().

v3->v4:
Some fixes according to
https://lore.kernel.org/all/CANn89i+EFEr7VHXNdOi59Ba_R1nFKSBJz
BzkJFVgCTdXBx=y...@mail.gmail.com/
1.Add legality check for UDP header in SKB.
2.Target this patch for net-next.

v2->v3:
Some fixes according to
https://lore.kernel.org/all/20240319102549.7f7f6...@gandalf.local.home/
1. Change the tracking directory to/sys/kernel/tracking.
2. Adjust the layout of the TP-STRUCT_entry parameter structure.

v1->v2:
Some fixes according to
https://lore.kernel.org/all/CANn89iL-y9e_VFpdw=sZtRnKRu_tnUwqHu
fqtjvjsv-nz1x...@mail.gmail.com/
1. adjust the trace_icmp_send() to more protocols than UDP.
2. move the calling of trace_icmp_send after sanity checks
in __icmp_send().

Signed-off-by: Peilin He 
Signed-off-by: xu xin 
Reviewed-by: Yunkai Zhang 
Cc: Yang Yang 
Cc: Liu Chun 
Cc: Xuexin Jiang 
---
 include/trace/events/icmp.h | 68 +
 net/ipv4/icmp.c |  4 +++
 2 files changed, 72 insertions(+)
 create mode 100644 include/trace/events/icmp.h

diff --git a/include/trace/events/icmp.h b/include/trace/events/icmp.h
new file mode 100644
index ..cff33aaee44e
--- /dev/null
+++ b/include/trace/events/icmp.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM icmp
+
+#if !defined(_TRACE_ICMP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_ICMP_H
+
+#include 
+#include 
+
+TRACE_EVENT(icmp_send,
+
+   TP_PROTO(const struct sk_buff *skb, int type, int code),
+
+   TP_ARGS(skb, type, code),
+
+   TP_STRUCT__entry(
+   __field(const void *, skbaddr)
+   __field(int, type)
+   __field(int, code)
+   __array(__u8, saddr, 4)
+   __array(__u8, daddr, 4)
+   __field(__u16, sport)
+   __field(__u16, dport)
+   __field(unsigned short, ulen)
+   ),
+
+   TP_fast_assign(
+   struct iphdr *iph = ip_hdr(skb);
+   int proto_4 = iph->protocol;
+   __be32 *p32;
+
+   __entry->skbaddr = skb;
+   __entry->type = type;
+   __entry->code = code;
+
+   struct udphdr *uh = udp_hdr(skb);
+
+   if (proto_4 != IPPROTO_UDP || (u8 *)uh < skb->head ||
+   (u8 *)uh + sizeof(struct udphdr)
+   > skb_tail_pointer(skb)) {
+   __entry->sport = 0;
+   __entry->dport = 0;
+   __entry->ulen = 0;
+   } else {
+   __entry->sport = ntohs(uh->source);
+   __entry->dport = ntohs(uh->dest);
+   __entry->ulen = ntohs(uh->len);
+   }
+
+   p32 = (__be32 *) __entry->saddr;
+   *p32 = iph->saddr;
+
+   p32 = (__be32 *) __entry->daddr;
+   *p32 = iph->daddr;
+   ),
+
+   TP_printk("icmp_send: type=%d, code=%d. 

[PATCH v6 16/16] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

BPF just-in-time compiler depended on CONFIG_MODULES because it used
module_alloc() to allocate memory for the generated code.

Since code allocations are now implemented with execmem, drop dependency of
CONFIG_BPF_JIT on CONFIG_MODULES and make it select CONFIG_EXECMEM.

Suggested-by: Björn Töpel 
Signed-off-by: Mike Rapoport (IBM) 
---
 kernel/bpf/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index bc25f5098a25..f999e4e0b344 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -43,7 +43,7 @@ config BPF_JIT
bool "Enable BPF Just In Time compiler"
depends on BPF
depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
-   depends on MODULES
+   select EXECMEM
help
  BPF programs are normally handled by a BPF interpreter. This option
  allows the kernel to generate native code when a program is loaded
-- 
2.43.0




[PATCH v6 15/16] kprobes: remove dependency on CONFIG_MODULES

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

kprobes depended on CONFIG_MODULES because it has to allocate memory for
code.

Since code allocations are now implemented with execmem, kprobes can be
enabled in non-modular kernels.

Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
dependency of CONFIG_KPROBES on CONFIG_MODULES.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/Kconfig|  2 +-
 include/linux/module.h  |  9 ++
 kernel/kprobes.c| 55 +++--
 kernel/trace/trace_kprobe.c | 20 +-
 4 files changed, 63 insertions(+), 23 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 4fd0daa54e6c..caa459964f09 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -52,9 +52,9 @@ config GENERIC_ENTRY
 
 config KPROBES
bool "Kprobes"
-   depends on MODULES
depends on HAVE_KPROBES
select KALLSYMS
+   select EXECMEM
select TASKS_RCU if PREEMPTION
help
  Kprobes allows you to trap at almost any kernel address and
diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..ffa1c603163c 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -605,6 +605,11 @@ static inline bool module_is_live(struct module *mod)
return mod->state != MODULE_STATE_GOING;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+return mod->state == MODULE_STATE_COMING;
+}
+
 struct module *__module_text_address(unsigned long addr);
 struct module *__module_address(unsigned long addr);
 bool is_module_address(unsigned long addr);
@@ -857,6 +862,10 @@ void *dereference_module_function_descriptor(struct module 
*mod, void *ptr)
return ptr;
 }
 
+static inline bool module_is_coming(struct module *mod)
+{
+   return false;
+}
 #endif /* CONFIG_MODULES */
 
 #ifdef CONFIG_SYSFS
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ddd7cdc16edf..ca2c6cbd42d2 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1588,7 +1588,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
}
 
/* Get module refcount and reject __init functions for loaded modules. 
*/
-   if (*probed_mod) {
+   if (IS_ENABLED(CONFIG_MODULES) && *probed_mod) {
/*
 * We must hold a refcount of the probed module while updating
 * its code to prohibit unexpected unloading.
@@ -1603,12 +1603,13 @@ static int check_kprobe_address_safe(struct kprobe *p,
 * kprobes in there.
 */
if (within_module_init((unsigned long)p->addr, *probed_mod) &&
-   (*probed_mod)->state != MODULE_STATE_COMING) {
+   !module_is_coming(*probed_mod)) {
module_put(*probed_mod);
*probed_mod = NULL;
ret = -ENOENT;
}
}
+
 out:
preempt_enable();
jump_label_unlock();
@@ -2488,24 +2489,6 @@ int kprobe_add_area_blacklist(unsigned long start, 
unsigned long end)
return 0;
 }
 
-/* Remove all symbols in given area from kprobe blacklist */
-static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
-{
-   struct kprobe_blacklist_entry *ent, *n;
-
-   list_for_each_entry_safe(ent, n, _blacklist, list) {
-   if (ent->start_addr < start || ent->start_addr >= end)
-   continue;
-   list_del(>list);
-   kfree(ent);
-   }
-}
-
-static void kprobe_remove_ksym_blacklist(unsigned long entry)
-{
-   kprobe_remove_area_blacklist(entry, entry + 1);
-}
-
 int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
   char *type, char *sym)
 {
@@ -2570,6 +2553,25 @@ static int __init populate_kprobe_blacklist(unsigned 
long *start,
return ret ? : arch_populate_kprobe_blacklist();
 }
 
+#ifdef CONFIG_MODULES
+/* Remove all symbols in given area from kprobe blacklist */
+static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
+{
+   struct kprobe_blacklist_entry *ent, *n;
+
+   list_for_each_entry_safe(ent, n, _blacklist, list) {
+   if (ent->start_addr < start || ent->start_addr >= end)
+   continue;
+   list_del(>list);
+   kfree(ent);
+   }
+}
+
+static void kprobe_remove_ksym_blacklist(unsigned long entry)
+{
+   kprobe_remove_area_blacklist(entry, entry + 1);
+}
+
 static void add_module_kprobe_blacklist(struct module *mod)
 {
unsigned long start, end;
@@ -2672,6 +2674,17 @@ static struct notifier_block kprobe_module_nb = {
.priority = 0
 };
 
+static int kprobe_register_module_notifier(void)
+{
+   return register_module_notifier(_module_nb);
+}
+#else
+static int kprobe_register_module_notifier(void)
+{
+   

[PATCH v6 14/16] powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where appropriate

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

There are places where CONFIG_MODULES guards the code that depends on
memory allocation being done with module_alloc().

Replace CONFIG_MODULES with CONFIG_EXECMEM in such places.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/include/asm/kasan.h | 2 +-
 arch/powerpc/kernel/head_8xx.S   | 4 ++--
 arch/powerpc/kernel/head_book3s_32.S | 6 +++---
 arch/powerpc/lib/code-patching.c | 2 +-
 arch/powerpc/mm/book3s32/mmu.c   | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..2e586733a464 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -285,7 +285,7 @@ config PPC
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
-   select KASAN_VMALLOCif KASAN && MODULES
+   select KASAN_VMALLOCif KASAN && EXECMEM
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_PAGE_SIZE
select MMU_GATHER_RCU_TABLE_FREE
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 365d2720097c..b5bbb94c51f6 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -19,7 +19,7 @@
 
 #define KASAN_SHADOW_SCALE_SHIFT   3
 
-#if defined(CONFIG_MODULES) && defined(CONFIG_PPC32)
+#if defined(CONFIG_EXECMEM) && defined(CONFIG_PPC32)
 #define KASAN_KERN_START   ALIGN_DOWN(PAGE_OFFSET - SZ_256M, SZ_256M)
 #else
 #define KASAN_KERN_START   PAGE_OFFSET
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 647b0b445e89..edc479a7c2bc 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -199,12 +199,12 @@ instruction_counter:
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r10, r11)
mtspr   SPRN_MD_EPN, r10
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
mfcrr11
compare_to_kernel_boundary r10, r10
 #endif
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
blt+3f
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index c1d89764dd22..57196883a00e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -419,14 +419,14 @@ InstructionTLBMiss:
  */
/* Get PTE (linux-style) and check access */
mfspr   r3,SPRN_IMISS
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw   0,r1,r3
 #endif
mfspr   r2, SPRN_SDR1
li  r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
rlwinm  r2, r2, 28, 0xf000
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
li  r0, 3
bgt-112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
@@ -442,7 +442,7 @@ InstructionTLBMiss:
andc.   r1,r1,r2/* check access & ~permission */
bne-InstructionAddressInvalid /* return if access not permitted */
/* Convert linux-style PTE to low word of PPC-style PTE */
-#ifdef CONFIG_MODULES
+#ifdef CONFIG_EXECMEM
rlwimi  r2, r0, 0, 31, 31   /* userspace ? -> PP lsb */
 #endif
ori r1, r1, 0xe06   /* clear out reserved bits */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6ab46156cda..7af791446ddf 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -225,7 +225,7 @@ void __init poking_init(void)
 
 static unsigned long get_patch_pfn(void *addr)
 {
-   if (IS_ENABLED(CONFIG_MODULES) && is_vmalloc_or_module_addr(addr))
+   if (IS_ENABLED(CONFIG_EXECMEM) && is_vmalloc_or_module_addr(addr))
return vmalloc_to_pfn(addr);
else
return __pa_symbol(addr) >> PAGE_SHIFT;
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 100f999871bc..625fe7d08e06 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -184,7 +184,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
 
 static bool is_module_segment(unsigned long addr)
 {
-   if (!IS_ENABLED(CONFIG_MODULES))
+   if (!IS_ENABLED(CONFIG_EXECMEM))
return false;
if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M))
return false;
-- 
2.43.0




[PATCH v6 13/16] x86/ftrace: enable dynamic ftrace without CONFIG_MODULES

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Dynamic ftrace must allocate memory for code and this was impossible
without CONFIG_MODULES.

With execmem separated from the modules code, execmem_text_alloc() is
available regardless of CONFIG_MODULES.

Remove dependency of dynamic ftrace on CONFIG_MODULES and make
CONFIG_DYNAMIC_FTRACE select CONFIG_EXECMEM in Kconfig.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/x86/Kconfig |  1 +
 arch/x86/kernel/ftrace.c | 10 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4474bf32d0a4..f2917ccf4fb4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -34,6 +34,7 @@ config X86_64
select SWIOTLB
select ARCH_HAS_ELFCORE_COMPAT
select ZONE_DMA32
+   select EXECMEM if DYNAMIC_FTRACE
 
 config FORCE_DYNAMIC_FTRACE
def_bool y
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index c8ddb7abda7c..8da0e66ca22d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -261,8 +261,6 @@ void arch_ftrace_update_code(int command)
 /* Currently only x86_64 supports dynamic trampolines */
 #ifdef CONFIG_X86_64
 
-#ifdef CONFIG_MODULES
-/* Module allocation simplifies allocating memory for code */
 static inline void *alloc_tramp(unsigned long size)
 {
return execmem_alloc(EXECMEM_FTRACE, size);
@@ -271,14 +269,6 @@ static inline void tramp_free(void *tramp)
 {
execmem_free(tramp);
 }
-#else
-/* Trampolines can only be created if modules are supported */
-static inline void *alloc_tramp(unsigned long size)
-{
-   return NULL;
-}
-static inline void tramp_free(void *tramp) { }
-#endif
 
 /* Defined as markers to the end of the ftrace default trampolines */
 extern void ftrace_regs_caller_end(void);
-- 
2.43.0




[PATCH v6 12/16] arch: make execmem setup available regardless of CONFIG_MODULES

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

execmem does not depend on modules, on the contrary modules use
execmem.

To make execmem available when CONFIG_MODULES=n, for instance for
kprobes, split execmem_params initialization out from
arch/*/kernel/module.c and compile it when CONFIG_EXECMEM=y

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm/kernel/module.c   |  43 --
 arch/arm/mm/init.c |  45 +++
 arch/arm64/kernel/module.c | 140 -
 arch/arm64/mm/init.c   | 140 +
 arch/loongarch/kernel/module.c |  19 -
 arch/loongarch/mm/init.c   |  21 +
 arch/mips/kernel/module.c  |  22 --
 arch/mips/mm/init.c|  23 ++
 arch/nios2/kernel/module.c |  20 -
 arch/nios2/mm/init.c   |  21 +
 arch/parisc/kernel/module.c|  20 -
 arch/parisc/mm/init.c  |  23 +-
 arch/powerpc/kernel/module.c   |  63 ---
 arch/powerpc/mm/mem.c  |  64 +++
 arch/riscv/kernel/module.c |  44 ---
 arch/riscv/mm/init.c   |  45 +++
 arch/s390/kernel/module.c  |  27 ---
 arch/s390/mm/init.c|  30 +++
 arch/sparc/kernel/module.c |  19 -
 arch/sparc/mm/Makefile |   2 +
 arch/sparc/mm/execmem.c|  21 +
 arch/x86/kernel/module.c   |  27 ---
 arch/x86/mm/init.c |  29 +++
 23 files changed, 463 insertions(+), 445 deletions(-)
 create mode 100644 arch/sparc/mm/execmem.c

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index a98fdf6ff26c..677f218f7e84 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -12,57 +12,14 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
-#include 
 
 #include 
 #include 
 #include 
 #include 
 
-#ifdef CONFIG_XIP_KERNEL
-/*
- * The XIP kernel text is mapped in the module area for modules and
- * some other stuff to work without any indirect relocations.
- * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
- * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
- */
-#undef MODULES_VADDR
-#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
-#endif
-
-#ifdef CONFIG_MMU
-static struct execmem_info execmem_info __ro_after_init;
-
-struct execmem_info __init *execmem_arch_setup(void)
-{
-   unsigned long fallback_start = 0, fallback_end = 0;
-
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
-   fallback_start = VMALLOC_START;
-   fallback_end = VMALLOC_END;
-   }
-
-   execmem_info = (struct execmem_info){
-   .ranges = {
-   [EXECMEM_DEFAULT] = {
-   .start  = MODULES_VADDR,
-   .end= MODULES_END,
-   .pgprot = PAGE_KERNEL_EXEC,
-   .alignment = 1,
-   .fallback_start = fallback_start,
-   .fallback_end   = fallback_end,
-   },
-   },
-   };
-
-   return _info;
-}
-#endif
-
 bool module_init_section(const char *name)
 {
return strstarts(name, ".init") ||
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index e8c6f4be0ce1..5345d218899a 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -486,3 +487,47 @@ void free_initrd_mem(unsigned long start, unsigned long 
end)
free_reserved_area((void *)start, (void *)end, -1, "initrd");
 }
 #endif
+
+#ifdef CONFIG_EXECMEM
+
+#ifdef CONFIG_XIP_KERNEL
+/*
+ * The XIP kernel text is mapped in the module area for modules and
+ * some other stuff to work without any indirect relocations.
+ * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
+ * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
+ */
+#undef MODULES_VADDR
+#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
+#endif
+
+#ifdef CONFIG_MMU
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
+{
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+   },
+   },
+

[PATCH v6 11/16] powerpc: extend execmem_params for kprobes allocations

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

powerpc overrides kprobes::alloc_insn_page() to remove writable
permissions when STRICT_MODULE_RWX is on.

Add definition of EXECMEM_KRPOBES to execmem_params to allow using the
generic kprobes::alloc_insn_page() with the desired permissions.

As powerpc uses breakpoint instructions to inject kprobes, it does not
need to constrain kprobe allocations to the modules area and can use the
entire vmalloc address space.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/kernel/kprobes.c | 20 
 arch/powerpc/kernel/module.c  |  7 +++
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9fcd01bb2ce6..14c5ddec3056 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -126,26 +126,6 @@ kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long 
addr, unsigned long offse
return (kprobe_opcode_t *)(addr + offset);
 }
 
-void *alloc_insn_page(void)
-{
-   void *page;
-
-   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
-   if (!page)
-   return NULL;
-
-   if (strict_module_rwx_enabled()) {
-   int err = set_memory_rox((unsigned long)page, 1);
-
-   if (err)
-   goto error;
-   }
-   return page;
-error:
-   execmem_free(page);
-   return NULL;
-}
-
 int arch_prepare_kprobe(struct kprobe *p)
 {
int ret = 0;
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index ac80559015a3..2a23cf7e141b 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -94,6 +94,7 @@ static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
 {
+   pgprot_t kprobes_prot = strict_module_rwx_enabled() ? PAGE_KERNEL_ROX : 
PAGE_KERNEL_EXEC;
pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : 
PAGE_KERNEL_EXEC;
unsigned long fallback_start = 0, fallback_end = 0;
unsigned long start, end;
@@ -132,6 +133,12 @@ struct execmem_info __init *execmem_arch_setup(void)
.fallback_start = fallback_start,
.fallback_end   = fallback_end,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = kprobes_prot,
+   .alignment = 1,
+   },
[EXECMEM_MODULE_DATA] = {
.start  = VMALLOC_START,
.end= VMALLOC_END,
-- 
2.43.0




[PATCH v6 10/16] arm64: extend execmem_info for generated code allocations

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The memory allocations for kprobes and BPF on arm64 can be placed
anywhere in vmalloc address space and currently this is implemented with
overrides of alloc_insn_page() and bpf_jit_alloc_exec() in arm64.

Define EXECMEM_KPROBES and EXECMEM_BPF ranges in arm64::execmem_info and
drop overrides of alloc_insn_page() and bpf_jit_alloc_exec().

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Will Deacon 
---
 arch/arm64/kernel/module.c | 12 
 arch/arm64/kernel/probes/kprobes.c |  7 ---
 arch/arm64/net/bpf_jit_comp.c  | 11 ---
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index b7a7a23f9f8f..a52240ea084b 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -146,6 +146,18 @@ struct execmem_info __init *execmem_arch_setup(void)
.fallback_start = fallback_start,
.fallback_end   = fallback_end,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL_ROX,
+   .alignment = 1,
+   },
+   [EXECMEM_BPF] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
},
};
 
diff --git a/arch/arm64/kernel/probes/kprobes.c 
b/arch/arm64/kernel/probes/kprobes.c
index 327855a11df2..4268678d0e86 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -129,13 +129,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
return 0;
 }
 
-void *alloc_insn_page(void)
-{
-   return __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS,
-   NUMA_NO_NODE, __builtin_return_address(0));
-}
-
 /* arm kprobe: install breakpoint in text */
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 122021f9bdfc..456f5af239fc 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1793,17 +1793,6 @@ u64 bpf_jit_alloc_exec_limit(void)
return VMALLOC_END - VMALLOC_START;
 }
 
-void *bpf_jit_alloc_exec(unsigned long size)
-{
-   /* Memory is intended to be executable, reset the pointer tag. */
-   return kasan_reset_tag(vmalloc(size));
-}
-
-void bpf_jit_free_exec(void *addr)
-{
-   return vfree(addr);
-}
-
 /* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool bpf_jit_supports_subprog_tailcalls(void)
 {
-- 
2.43.0




[PATCH v6 09/16] riscv: extend execmem_params for generated code allocations

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

The memory allocations for kprobes and BPF on RISC-V are not placed in
the modules area and these custom allocations are implemented with
overrides of alloc_insn_page() and  bpf_jit_alloc_exec().

Slightly reorder execmem_params initialization to support both 32 and 64
bit variants, define EXECMEM_KPROBES and EXECMEM_BPF ranges in
riscv::execmem_params and drop overrides of alloc_insn_page() and
bpf_jit_alloc_exec().

Signed-off-by: Mike Rapoport (IBM) 
Reviewed-by: Alexandre Ghiti 
---
 arch/riscv/kernel/module.c | 28 +---
 arch/riscv/kernel/probes/kprobes.c | 10 --
 arch/riscv/net/bpf_jit_core.c  | 13 -
 3 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
index 182904127ba0..2ecbacbc9993 100644
--- a/arch/riscv/kernel/module.c
+++ b/arch/riscv/kernel/module.c
@@ -906,19 +906,41 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
+#ifdef CONFIG_MMU
 static struct execmem_info execmem_info __ro_after_init;
 
 struct execmem_info __init *execmem_arch_setup(void)
 {
+   unsigned long start, end;
+
+   if (IS_ENABLED(CONFIG_64BIT)) {
+   start = MODULES_VADDR;
+   end = MODULES_END;
+   } else {
+   start = VMALLOC_START;
+   end = VMALLOC_END;
+   }
+
execmem_info = (struct execmem_info){
.ranges = {
[EXECMEM_DEFAULT] = {
-   .start  = MODULES_VADDR,
-   .end= MODULES_END,
+   .start  = start,
+   .end= end,
.pgprot = PAGE_KERNEL,
.alignment = 1,
},
+   [EXECMEM_KPROBES] = {
+   .start  = VMALLOC_START,
+   .end= VMALLOC_END,
+   .pgprot = PAGE_KERNEL_READ_EXEC,
+   .alignment = 1,
+   },
+   [EXECMEM_BPF] = {
+   .start  = BPF_JIT_REGION_START,
+   .end= BPF_JIT_REGION_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = PAGE_SIZE,
+   },
},
};
 
diff --git a/arch/riscv/kernel/probes/kprobes.c 
b/arch/riscv/kernel/probes/kprobes.c
index 2f08c14a933d..e64f2f3064eb 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -104,16 +104,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
return 0;
 }
 
-#ifdef CONFIG_MMU
-void *alloc_insn_page(void)
-{
-   return  __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
-GFP_KERNEL, PAGE_KERNEL_READ_EXEC,
-VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-__builtin_return_address(0));
-}
-#endif
-
 /* install breakpoint in text */
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index 6b3acac30c06..e238fdbd5dbc 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -219,19 +219,6 @@ u64 bpf_jit_alloc_exec_limit(void)
return BPF_JIT_REGION_SIZE;
 }
 
-void *bpf_jit_alloc_exec(unsigned long size)
-{
-   return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START,
-   BPF_JIT_REGION_END, GFP_KERNEL,
-   PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-}
-
-void bpf_jit_free_exec(void *addr)
-{
-   return vfree(addr);
-}
-
 void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
int ret;
-- 
2.43.0




[PATCH v6 08/16] mm/execmem, arch: convert remaining overrides of module_alloc to execmem

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Extend execmem parameters to accommodate more complex overrides of
module_alloc() by architectures.

This includes specification of a fallback range required by arm, arm64
and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for
allocation of KASAN shadow required by s390 and x86 and support for
late initialization of execmem required by arm64.

The core implementation of execmem_alloc() takes care of suppressing
warnings when the initial allocation fails but there is a fallback range
defined.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Will Deacon 
---
 arch/Kconfig |  8 
 arch/arm/kernel/module.c | 41 
 arch/arm64/Kconfig   |  1 +
 arch/arm64/kernel/module.c   | 55 ++
 arch/powerpc/kernel/module.c | 60 +++--
 arch/s390/kernel/module.c| 54 +++---
 arch/x86/kernel/module.c | 70 +++--
 include/linux/execmem.h  | 30 ++-
 include/linux/moduleloader.h | 12 --
 kernel/module/main.c | 26 +++--
 mm/execmem.c | 75 ++--
 11 files changed, 247 insertions(+), 185 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 65afb1de48b3..4fd0daa54e6c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -960,6 +960,14 @@ config ARCH_WANTS_MODULES_DATA_IN_VMALLOC
  For architectures like powerpc/32 which have constraints on module
  allocation and need to allocate module data outside of module area.
 
+config ARCH_WANTS_EXECMEM_LATE
+   bool
+   help
+ For architectures that do not allocate executable memory early on
+ boot, but rather require its initialization late when there is
+ enough entropy for module space randomization, for instance
+ arm64.
+
 config HAVE_IRQ_EXIT_ON_IRQ_STACK
bool
help
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index e74d84f58b77..a98fdf6ff26c 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -34,23 +35,31 @@
 #endif
 
 #ifdef CONFIG_MMU
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   gfp_t gfp_mask = GFP_KERNEL;
-   void *p;
-
-   /* Silence the initial allocation */
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
-   gfp_mask |= __GFP_NOWARN;
-
-   p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-   if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
-   return p;
-   return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   unsigned long fallback_start = 0, fallback_end = 0;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   fallback_start = VMALLOC_START;
+   fallback_end = VMALLOC_END;
+   }
+
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   .fallback_start = fallback_start,
+   .fallback_end   = fallback_end,
+   },
+   },
+   };
+
+   return _info;
 }
 #endif
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..74b34a78b7ac 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -105,6 +105,7 @@ config ARM64
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES 
&& !ARM64_VA_BITS_36)
select ARCH_WANT_LD_ORPHAN_WARN
+   select ARCH_WANTS_EXECMEM_LATE if EXECMEM
select ARCH_WANTS_NO_INSTR
select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
select ARCH_HAS_UBSAN
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index e92da4da1b2a..b7a7a23f9f8f 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -108,41 +109,47 @@ static int __init module_init_limits(void)
 
return 0;
 }
-subsys_initcall(module_init_limits);
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   void *p = NULL;
+   

[PATCH v6 07/16] mm/execmem, arch: convert simple overrides of module_alloc to execmem

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Several architectures override module_alloc() only to define address
range for code allocations different than VMALLOC address space.

Provide a generic implementation in execmem that uses the parameters for
address space ranges, required alignment and page protections provided
by architectures.

The architectures must fill execmem_info structure and implement
execmem_arch_setup() that returns a pointer to that structure. This way the
execmem initialization won't be called from every architecture, but rather
from a central place, namely a core_initcall() in execmem.

The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
with the parameters defined by the architectures.  If an architecture does
not implement execmem_arch_setup(), execmem_alloc() will fall back to
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/loongarch/kernel/module.c | 19 --
 arch/mips/kernel/module.c  | 20 --
 arch/nios2/kernel/module.c | 21 ---
 arch/parisc/kernel/module.c| 24 
 arch/riscv/kernel/module.c | 24 
 arch/sparc/kernel/module.c | 20 --
 include/linux/execmem.h| 47 
 mm/execmem.c   | 67 --
 mm/mm_init.c   |  2 +
 9 files changed, 210 insertions(+), 34 deletions(-)

diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
index c7d0338d12c1..ca6dd7ea1610 100644
--- a/arch/loongarch/kernel/module.c
+++ b/arch/loongarch/kernel/module.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -490,10 +491,22 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, 
__builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 
 static void module_init_ftrace_plt(const Elf_Ehdr *hdr,
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 9a6c96014904..59225a3cf918 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct mips_hi16 {
@@ -32,11 +33,22 @@ static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
 #ifdef MODULES_VADDR
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 #endif
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 9c97b7513853..0d1ee86631fc 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -18,15 +18,26 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
-void *module_alloc(unsigned long size)
+static struct execmem_info execmem_info __ro_after_init;
+
+struct execmem_info __init *execmem_arch_setup(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC,
-   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   execmem_info = (struct execmem_info){
+   .ranges = {
+   [EXECMEM_DEFAULT] = {
+   .start  = MODULES_VADDR,
+   .end= MODULES_END,
+   .pgprot = PAGE_KERNEL_EXEC,
+   .alignment = 1,
+   },
+   },
+   };
+
+   return _info;
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
diff --git a/arch/parisc/kernel/module.c 

[PATCH v6 06/16] mm: introduce execmem_alloc() and execmem_free()

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystems
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

Start splitting code allocation from modules by introducing execmem_alloc()
and execmem_free() APIs.

Initially, execmem_alloc() is a wrapper for module_alloc() and
execmem_free() is a replacement of module_memfree() to allow updating all
call sites to use the new APIs.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem_alloc() takes
a type argument, that will be used to identify the calling subsystem and to
allow architectures define parameters for ranges suitable for that
subsystem.

No functional changes.

Signed-off-by: Mike Rapoport (IBM) 
Acked-by: Masami Hiramatsu (Google) 
---
 arch/powerpc/kernel/kprobes.c|  6 ++--
 arch/s390/kernel/ftrace.c|  4 +--
 arch/s390/kernel/kprobes.c   |  4 +--
 arch/s390/kernel/module.c|  5 +--
 arch/sparc/net/bpf_jit_comp_32.c |  8 ++---
 arch/x86/kernel/ftrace.c |  6 ++--
 arch/x86/kernel/kprobes/core.c   |  4 +--
 include/linux/execmem.h  | 57 
 include/linux/moduleloader.h |  3 --
 kernel/bpf/core.c|  6 ++--
 kernel/kprobes.c |  8 ++---
 kernel/module/Kconfig|  1 +
 kernel/module/main.c | 25 +-
 mm/Kconfig   |  3 ++
 mm/Makefile  |  1 +
 mm/execmem.c | 32 ++
 16 files changed, 128 insertions(+), 45 deletions(-)
 create mode 100644 include/linux/execmem.h
 create mode 100644 mm/execmem.c

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index bbca90a5e2ec..9fcd01bb2ce6 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -19,8 +19,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -130,7 +130,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
 
@@ -142,7 +142,7 @@ void *alloc_insn_page(void)
}
return page;
 error:
-   module_memfree(page);
+   execmem_free(page);
return NULL;
 }
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..798249ef5646 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -7,13 +7,13 @@
  *   Author(s): Martin Schwidefsky 
  */
 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -220,7 +220,7 @@ static int __init ftrace_plt_init(void)
 {
const char *start, *end;
 
-   ftrace_plt = module_alloc(PAGE_SIZE);
+   ftrace_plt = execmem_alloc(EXECMEM_FTRACE, PAGE_SIZE);
if (!ftrace_plt)
panic("cannot allocate ftrace plt\n");
 
diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index f0cf20d4b3c5..3c1b1be744de 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -9,7 +9,6 @@
 
 #define pr_fmt(fmt) "kprobes: " fmt
 
-#include 
 #include 
 #include 
 #include 
@@ -21,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -38,7 +38,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
if (!page)
return NULL;
set_memory_rox((unsigned long)page, 1);
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 42215f9404af..ac97a905e8cd 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,7 +77,7 @@ void *module_alloc(unsigned long size)
 #ifdef CONFIG_FUNCTION_TRACER
 void module_arch_cleanup(struct module *mod)
 {
-   module_memfree(mod->arch.trampolines_start);
+   execmem_free(mod->arch.trampolines_start);
 }
 #endif
 
@@ -510,7 +511,7 @@ static int module_alloc_ftrace_hotpatch_trampolines(struct 
module *me,
 
size = FTRACE_HOTPATCH_TRAMPOLINES_SIZE(s->sh_size);
numpages = DIV_ROUND_UP(size, PAGE_SIZE);
-   start = module_alloc(numpages * PAGE_SIZE);
+   start = execmem_alloc(EXECMEM_FTRACE, numpages * PAGE_SIZE);
if (!start)
return -ENOMEM;

[PATCH v6 05/16] module: make module_memory_{alloc,free} more self-contained

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Move the logic related to the memory allocation and freeing into
module_memory_alloc() and module_memory_free().

Signed-off-by: Mike Rapoport (IBM) 
---
 kernel/module/main.c | 64 +++-
 1 file changed, 39 insertions(+), 25 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index e1e8a7a9d6c1..5b82b069e0d3 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1203,15 +1203,44 @@ static bool mod_mem_use_vmalloc(enum mod_mem_type type)
mod_mem_type_is_core_data(type);
 }
 
-static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
+static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
 {
+   unsigned int size = PAGE_ALIGN(mod->mem[type].size);
+   void *ptr;
+
+   mod->mem[type].size = size;
+
if (mod_mem_use_vmalloc(type))
-   return vzalloc(size);
-   return module_alloc(size);
+   ptr = vmalloc(size);
+   else
+   ptr = module_alloc(size);
+
+   if (!ptr)
+   return -ENOMEM;
+
+   /*
+* The pointer to these blocks of memory are stored on the module
+* structure and we keep that around so long as the module is
+* around. We only free that memory when we unload the module.
+* Just mark them as not being a leak then. The .init* ELF
+* sections *do* get freed after boot so we *could* treat them
+* slightly differently with kmemleak_ignore() and only grey
+* them out as they work as typical memory allocations which
+* *do* eventually get freed, but let's just keep things simple
+* and avoid *any* false positives.
+*/
+   kmemleak_not_leak(ptr);
+
+   memset(ptr, 0, size);
+   mod->mem[type].base = ptr;
+
+   return 0;
 }
 
-static void module_memory_free(void *ptr, enum mod_mem_type type)
+static void module_memory_free(struct module *mod, enum mod_mem_type type)
 {
+   void *ptr = mod->mem[type].base;
+
if (mod_mem_use_vmalloc(type))
vfree(ptr);
else
@@ -1229,12 +1258,12 @@ static void free_mod_mem(struct module *mod)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
-   module_memory_free(mod_mem->base, type);
+   module_memory_free(mod, type);
}
 
/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, 
mod->mem[MOD_DATA].size);
-   module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
+   module_memory_free(mod, MOD_DATA);
 }
 
 /* Free a module, remove from lists, etc. */
@@ -2225,7 +2254,6 @@ static int find_module_sections(struct module *mod, 
struct load_info *info)
 static int move_module(struct module *mod, struct load_info *info)
 {
int i;
-   void *ptr;
enum mod_mem_type t = 0;
int ret = -ENOMEM;
 
@@ -2234,26 +2262,12 @@ static int move_module(struct module *mod, struct 
load_info *info)
mod->mem[type].base = NULL;
continue;
}
-   mod->mem[type].size = PAGE_ALIGN(mod->mem[type].size);
-   ptr = module_memory_alloc(mod->mem[type].size, type);
-   /*
- * The pointer to these blocks of memory are stored on the 
module
- * structure and we keep that around so long as the module is
- * around. We only free that memory when we unload the module.
- * Just mark them as not being a leak then. The .init* ELF
- * sections *do* get freed after boot so we *could* treat them
- * slightly differently with kmemleak_ignore() and only grey
- * them out as they work as typical memory allocations which
- * *do* eventually get freed, but let's just keep things simple
- * and avoid *any* false positives.
-*/
-   kmemleak_not_leak(ptr);
-   if (!ptr) {
+
+   ret = module_memory_alloc(mod, type);
+   if (ret) {
t = type;
goto out_enomem;
}
-   memset(ptr, 0, mod->mem[type].size);
-   mod->mem[type].base = ptr;
}
 
/* Transfer each section which specifies SHF_ALLOC */
@@ -2296,7 +2310,7 @@ static int move_module(struct module *mod, struct 
load_info *info)
return 0;
 out_enomem:
for (t--; t >= 0; t--)
-   module_memory_free(mod->mem[t].base, t);
+   module_memory_free(mod, t);
return ret;
 }
 
-- 
2.43.0




[PATCH v6 04/16] sparc: simplify module_alloc()

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END
for 32-bit and reduce module_alloc() to

__vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, ...)

as with the new defines the allocations becomes identical for both 32
and 64 bits.

While on it, drop unused include of 

Suggested-by: Sam Ravnborg 
Signed-off-by: Mike Rapoport (IBM) 
---
 arch/sparc/include/asm/pgtable_32.h |  2 ++
 arch/sparc/kernel/module.c  | 25 +
 2 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_32.h 
b/arch/sparc/include/asm/pgtable_32.h
index 9e85d57ac3f2..62bcafe38b1f 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -432,6 +432,8 @@ static inline int io_remap_pfn_range(struct vm_area_struct 
*vma,
 
 #define VMALLOC_START   _AC(0xfe60,UL)
 #define VMALLOC_END _AC(0xffc0,UL)
+#define MODULES_VADDR   VMALLOC_START
+#define MODULES_END VMALLOC_END
 
 /* We provide our own get_unmapped_area to cope with VA holes for userland */
 #define HAVE_ARCH_UNMAPPED_AREA
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index 66c45a2764bc..d37adb2a0b54 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -21,35 +21,12 @@
 
 #include "entry.h"
 
-#ifdef CONFIG_SPARC64
-
-#include 
-
-static void *module_map(unsigned long size)
+void *module_alloc(unsigned long size)
 {
-   if (PAGE_ALIGN(size) > MODULES_LEN)
-   return NULL;
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
-#else
-static void *module_map(unsigned long size)
-{
-   return vmalloc(size);
-}
-#endif /* CONFIG_SPARC64 */
-
-void *module_alloc(unsigned long size)
-{
-   void *ret;
-
-   ret = module_map(size);
-   if (ret)
-   memset(ret, 0, size);
-
-   return ret;
-}
 
 /* Make generic code ignore STT_REGISTER dummy undefined symbols.  */
 int module_frob_arch_sections(Elf_Ehdr *hdr,
-- 
2.43.0




[PATCH v6 03/16] nios2: define virtual address space for modules

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

nios2 uses kmalloc() to implement module_alloc() because CALL26/PCREL26
cannot reach all of vmalloc address space.

Define module space as 32MiB below the kernel base and switch nios2 to
use vmalloc for module allocations.

Suggested-by: Thomas Gleixner 
Acked-by: Dinh Nguyen 
Acked-by: Song Liu 
Signed-off-by: Mike Rapoport (IBM) 
---
 arch/nios2/include/asm/pgtable.h |  5 -
 arch/nios2/kernel/module.c   | 19 ---
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index d052dfcbe8d3..eab87c6beacb 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -25,7 +25,10 @@
 #include 
 
 #define VMALLOC_START  CONFIG_NIOS2_KERNEL_MMU_REGION_BASE
-#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
+#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M - 1)
+
+#define MODULES_VADDR  (CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M)
+#define MODULES_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
 
 struct mm_struct;
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 76e0a42d6e36..9c97b7513853 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -21,23 +21,12 @@
 
 #include 
 
-/*
- * Modules should NOT be allocated with kmalloc for (obvious) reasons.
- * But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
- * from 0x8000 (vmalloc area) to 0xc (kernel) (kmalloc returns
- * addresses in 0xc000)
- */
 void *module_alloc(unsigned long size)
 {
-   if (size == 0)
-   return NULL;
-   return kmalloc(size, GFP_KERNEL);
-}
-
-/* Free memory returned from module_alloc */
-void module_memfree(void *module_region)
-{
-   kfree(module_region);
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+   GFP_KERNEL, PAGE_KERNEL_EXEC,
+   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
-- 
2.43.0




[PATCH v6 02/16] mips: module: rename MODULE_START to MODULES_VADDR

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

and MODULE_END to MODULES_END to match other architectures that define
custom address space for modules.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/mips/include/asm/pgtable-64.h | 4 ++--
 arch/mips/kernel/module.c  | 4 ++--
 arch/mips/mm/fault.c   | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/pgtable-64.h 
b/arch/mips/include/asm/pgtable-64.h
index 20ca48c1b606..c0109aff223b 100644
--- a/arch/mips/include/asm/pgtable-64.h
+++ b/arch/mips/include/asm/pgtable-64.h
@@ -147,8 +147,8 @@
 #if defined(CONFIG_MODULES) && defined(KBUILD_64BIT_SYM32) && \
VMALLOC_START != CKSSEG
 /* Load modules into 32bit-compatible segment. */
-#define MODULE_START   CKSSEG
-#define MODULE_END (FIXADDR_START-2*PAGE_SIZE)
+#define MODULES_VADDR  CKSSEG
+#define MODULES_END(FIXADDR_START-2*PAGE_SIZE)
 #endif
 
 #define pte_ERROR(e) \
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 7b2fbaa9cac5..9a6c96014904 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -31,10 +31,10 @@ struct mips_hi16 {
 static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
-#ifdef MODULE_START
+#ifdef MODULES_VADDR
 void *module_alloc(unsigned long size)
 {
-   return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
 }
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index aaa9a242ebba..37fedeaca2e9 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -83,8 +83,8 @@ static void __do_page_fault(struct pt_regs *regs, unsigned 
long write,
 
if (unlikely(address >= VMALLOC_START && address <= VMALLOC_END))
goto VMALLOC_FAULT_TARGET;
-#ifdef MODULE_START
-   if (unlikely(address >= MODULE_START && address < MODULE_END))
+#ifdef MODULES_VADDR
+   if (unlikely(address >= MODULES_VADDR && address < MODULES_END))
goto VMALLOC_FAULT_TARGET;
 #endif
 
-- 
2.43.0




[PATCH v6 01/16] arm64: module: remove unneeded call to kasan_alloc_module_shadow()

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Since commit f6f37d9320a1 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS
modes") KASAN_VMALLOC is always enabled when KASAN is on. This means
that allocations in module_alloc() will be tracked by KASAN protection
for vmalloc() and that kasan_alloc_module_shadow() will be always an
empty inline and there is no point in calling it.

Drop meaningless call to kasan_alloc_module_shadow() from
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm64/kernel/module.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 47e0be610bb6..e92da4da1b2a 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -141,11 +141,6 @@ void *module_alloc(unsigned long size)
__func__);
}
 
-   if (p && (kasan_alloc_module_shadow(p, size, GFP_KERNEL) < 0)) {
-   vfree(p);
-   return NULL;
-   }
-
/* Memory is intended to be executable, reset the pointer tag. */
return kasan_reset_tag(p);
 }
-- 
2.43.0




[PATCH v6 00/16] mm: jit/text allocator

2024-04-26 Thread Mike Rapoport
From: "Mike Rapoport (IBM)" 

Hi,

The patches are also available in git:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v6

v6 changes:
* restore patch "arm64: extend execmem_info for generated code
  allocations" that disappeared in v5 rebase
* update execmem initialization so that by default it will be
  initialized early while late initialization will be an opt-in

v5: https://lore.kernel.org/all/20240422094436.3625171-1-r...@kernel.org
* rebase on v6.9-rc4 to avoid a conflict in kprobes
* add copyrights to mm/execmem.c (Luis)
* fix spelling (Ingo)
* define MODULES_VADDDR for sparc (Sam)
* consistently initialize struct execmem_info (Peter)
* reduce #ifdefs in function bodies in kprobes (Masami) 

v4: https://lore.kernel.org/all/20240411160051.2093261-1-r...@kernel.org
* rebase on v6.9-rc2
* rename execmem_params to execmem_info and execmem_arch_params() to
  execmem_arch_setup()
* use single execmem_alloc() API instead of execmem_{text,data}_alloc() (Song)
* avoid extra copy of execmem parameters (Rick)
* run execmem_init() as core_initcall() except for the architectures that
  may allocated text really early (currently only x86) (Will)
* add acks for some of arm64 and riscv changes, thanks Will and Alexandre
* new commits:
  - drop call to kasan_alloc_module_shadow() on arm64 because it's not
needed anymore
  - rename MODULE_START to MODULES_VADDR on MIPS
  - use CONFIG_EXECMEM instead of CONFIG_MODULES on powerpc as per Christophe:
https://lore.kernel.org/all/79062fa3-3402-47b3-8920-9231ad05e...@csgroup.eu/

v3: https://lore.kernel.org/all/20230918072955.2507221-1-r...@kernel.org
* add type parameter to execmem allocation APIs
* remove BPF dependency on modules

v2: https://lore.kernel.org/all/20230616085038.4121892-1-r...@kernel.org
* Separate "module" and "others" allocations with execmem_text_alloc()
and jit_text_alloc()
* Drop ROX entailment on x86
* Add ack for nios2 changes, thanks Dinh Nguyen

v1: https://lore.kernel.org/all/20230601101257.530867-1-r...@kernel.org

= Cover letter from v1 (sligtly updated) =

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystmes
that need to allocate code, such as ftrace, kprobes and BPF to modules and
puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

A centralized infrastructure for code allocation allows allocations of
executable memory as ROX, and future optimizations such as caching large
pages for better iTLB performance and providing sub-page allocations for
users that only need small jit code snippets.

Rick Edgecombe proposed perm_alloc extension to vmalloc [1] and Song Liu
proposed execmem_alloc [2], but both these approaches were targeting BPF
allocations and lacked the ground work to abstract executable allocations
and split them from the modules core.

Thomas Gleixner suggested to express module allocation restrictions and
requirements as struct mod_alloc_type_params [3] that would define ranges,
protections and other parameters for different types of allocations used by
modules and following that suggestion Song separated allocations of
different types in modules (commit ac3b43283923 ("module: replace
module_layout with module_memory")) and posted "Type aware module
allocator" set [4].

I liked the idea of parametrising code allocation requirements as a
structure, but I believe the original proposal and Song's module allocator
was too module centric, so I came up with these patches.

This set splits code allocation from modules by introducing execmem_alloc()
and and execmem_free(), APIs, replaces call sites of module_alloc() and
module_memfree() with the new APIs and implements core text and related
allocations in a central place.

Instead of architecture specific overrides for module_alloc(), the
architectures that require non-default behaviour for text allocation must
fill execmem_info structure and implement execmem_arch_setup() that returns
a pointer to that structure. If an architecture does not implement
execmem_arch_setup(), the defaults compatible with the current
modules::module_alloc() are used.

Since architectures define different restrictions on placement,
permissions, alignment and other parameters for memory that can be used by
different subsystems that allocate executable memory, execmem APIs
take a type argument, that will be used to identify the calling subsystem
and to allow architectures to define parameters for ranges suitable for that
subsystem.

The new infrastructure allows decoupling of BPF, kprobes and ftrace from
modules, and most importantly it paves the way for ROX allocations for
executable memory.

[1] 
https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgeco...@intel.com/
[2] 

[PATCH] tracing: Fix uaf issue in tracing_open_file_tr

2024-04-26 Thread Tze-nan wu
"tracing_event_file" is at the risk of use-after-free due to the race of
two functions "tracing_open_file_tr" and "synth_event_release".
Specifically, it could be freed by synth_event_release before
tracing_open_file_tr has the opportunity to access its members.

It's easy to reproduced by first executing script 'A' and then script 'B'
in my environment.

  Script 'A':
echo "hello int aaa" > /sys/kernel/tracing/synthetic_events
while :
do
  echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable
done

  Script 'B':
echo > /sys/kernel/tracing/synthetic/events

  My environment:
arm64 + generic and swtag based KASAN both enabled + kernel-6.6.18

KASAN report
==
BUG: KASAN: slab-use-after-free in tracing_open_file_tr
Read of size 8 at addr 9*** by task sh/3485
Pointer tag: [9*], memory tag: [fe]

CPU: * PID: 3485 Comm: sh Tainted: 
Call trace:
 __hwasan_tag_mismatch
 tracing_open_file_tr
 do_dentry_open
 vfs_open
 path_openat

Freed by task 3490:
 slab_free_freelist_hook
 kmem_cache_free
 event_file_put
 remove_event_file_dir
 __trace_remove_event_call
 trace_remove_event_call
 synth_event_release
 dyn_events_release_all
 synth_events_open

page last allocated via order 0, migratetype Unmovable:
 __slab_alloc
 kmem_cache_alloc
 trace_create_new_event
 trace_add_event_call
 register_synth_event
 __create_synth_event
 create_or_delete_synth_event
 trace_parse_run_command
 synth_events_write
 vfs_write

Based on the assumption that eventfs_inode will persist throughout the
execution of the tracing_open_file_tr function,
we can fix this issue by incrementing the reference count of
trace_event_file once it is assigned to eventfs_inode->data.
The reference count will then be decremented during the release phase of
eventfs_inode.

This ensures that trace_event_file is not prematurely freed while the
tracing_open_file_tr function is being called.

Fixes: bb32500fb9b7 ("tracing: Have trace_event_file have ref counters")
Co-developed-by: Cheng-Jui Wang 
Signed-off-by: Cheng-Jui Wang 
Signed-off-by: Tze-nan wu 
---
BTW, I've also attempted to reproduce the same issue in another
environment (described below).
It's also reproducible but in a lower rate.

another environment:
  x86 + ubuntu + generic kasan enabled + kernel-6.9-rc4

After applying the patch, KASAN no longer reports any issues when
following the same reproduction steps in my arm64 environment. 
However, I am concerned about potential side effects that the patch might 
introduce.
Additionally, the newly added function "is_entry_event_callback" may not
be reliable in my opinion,
as the callback function used in the comparison could change in future. 
Nonetheless, this is the best solution I can come up with now.

Looking for any suggestion or solution, appreciate.

 fs/tracefs/event_inode.c | 3 +++
 include/linux/trace_events.h | 4 
 kernel/trace/trace_events.c  | 6 ++
 3 files changed, 13 insertions(+)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 894c6ca1e500..fd49c0f88500 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "internal.h"
@@ -90,6 +91,8 @@ static void release_ei(struct kref *ref)
 
kfree(ei->entry_attrs);
kfree_const(ei->name);
+   if (ei->nr_entries && is_entry_event_callback(ei->entries[0]))
+   event_file_put(ei->data);
if (ei->is_events) {
rei = get_root_inode(ei);
kfree_rcu(rei, ei.rcu);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 6f9bdfb09d1d..602e87682ee2 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct trace_array;
 struct array_buffer;
@@ -505,6 +506,7 @@ extern struct trace_event_file *trace_get_event_file(const 
char *instance,
 const char *system,
 const char *event);
 extern void trace_put_event_file(struct trace_event_file *file);
+bool is_entry_event_callback(struct eventfs_entry entry);
 
 #define MAX_DYNEVENT_CMD_LEN   (2048)
 
@@ -731,6 +733,8 @@ extern void
 event_triggers_post_call(struct trace_event_file *file,
 enum event_trigger_type tt);
 
+extern void event_file_put(struct trace_event_file *file);
+
 bool trace_event_ignore_this_pid(struct trace_event_file *trace_file);
 
 bool __trace_trigger_soft_disabled(struct trace_event_file *file);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 52f75c36bbca..de01676b59ff 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2626,6 +2626,7 @@ event_create_dir(struct eventfs_inode *parent, struct 
trace_event_file *file)

Re: [RFC][PATCH] uprobe: support for private hugetlb mappings

2024-04-26 Thread David Hildenbrand

On 26.04.24 02:09, Guillaume Morin wrote:

On 25 Apr 21:56, David Hildenbrand wrote:


On 25.04.24 17:19, Guillaume Morin wrote:

On 24 Apr 23:00, David Hildenbrand wrote:

One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for
hugetlb mappings. However this was also on my TODO and I have a draft
patch that implements it.


Yes, I documented it back then and added sanity checks in GUP code to fence
it off. Shouldn't be too hard to implement (famous last words) and would be
the cleaner thing to use here once I manage to switch over to
FOLL_WRITE|FOLL_FORCE to break COW.


Yes, my patch seems to be working. The hugetlb code is pretty simple.
And it allows ptrace and the proc pid mem file to work on the executable
private hugetlb mappings.

There is one thing I am unclear about though. hugetlb enforces that
huge_pte_write() is true on FOLL_WRITE in both the fault and
follow_page_mask paths. I am not sure if we can simply assume in the
hugetlb code that if the pte is not writable and this is a write fault
then we're in the FOLL_FORCE|FOLL_WRITE case.  Or do we want to keep the
checks simply not enforce it for FOLL_FORCE|FOLL_WRITE?

The latter is more complicated in the fault path because there is no
FAULT_FLAG_FORCE flag.



I just pushed something to
https://github.com/davidhildenbrand/linux/tree/uprobes_cow

Only very lightly tested so far. Expect the worst :)



I'll try it out and send you the hugetlb bits



I still detest having the zapping logic there, but to get it all right I
don't see a clean way around that.


For hugetlb, we'd primarily have to implement the
mm_walk_ops->hugetlb_entry() callback (well, and FOLL_FORCE).


For FOLL_FORCE, heer is my draft. Let me know if this is what you had in
mind.


diff --git a/mm/gup.c b/mm/gup.c
index 1611e73b1121..ac60e0ae64e8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1056,9 +1056,6 @@ static int check_vma_flags(struct vm_area_struct *vma, 
unsigned long gup_flags)
if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
if (!(gup_flags & FOLL_FORCE))
return -EFAULT;
-   /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */
-   if (is_vm_hugetlb_page(vma))
-   return -EFAULT;
/*
 * We used to let the write,force case do COW in a
 * VM_MAYWRITE VM_SHARED !VM_WRITE vma, so ptrace could
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3548eae42cf9..73f86eddf888 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5941,7 +5941,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct 
vm_area_struct *vma,
   struct folio *pagecache_folio, spinlock_t *ptl,
   struct vm_fault *vmf)
  {
-   const bool unshare = flags & FAULT_FLAG_UNSHARE;
+   const bool make_writable = !(flags & FAULT_FLAG_UNSHARE) &&
+   (vma->vm_flags & VM_WRITE);
pte_t pte = huge_ptep_get(ptep);
struct hstate *h = hstate_vma(vma);
struct folio *old_folio;
@@ -5959,16 +5960,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, 
struct vm_area_struct *vma,
 * can trigger this, because hugetlb_fault() will always resolve
 * uffd-wp bit first.
 */
-   if (!unshare && huge_pte_uffd_wp(pte))
+   if (make_writable && huge_pte_uffd_wp(pte))
return 0;
  
-	/*

-* hugetlb does not support FOLL_FORCE-style write faults that keep the
-* PTE mapped R/O such as maybe_mkwrite() would do.
-*/
-   if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE)))
-   return VM_FAULT_SIGSEGV;
-
/* Let's take out MAP_SHARED mappings first. */
if (vma->vm_flags & VM_MAYSHARE) {
set_huge_ptep_writable(vma, haddr, ptep);
@@ -5989,7 +5983,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct 
vm_area_struct *vma,
folio_move_anon_rmap(old_folio, vma);
SetPageAnonExclusive(_folio->page);
}
-   if (likely(!unshare))
+   if (likely(make_writable))
set_huge_ptep_writable(vma, haddr, ptep);


Maybe we want to refactor that similarly into a 
set_huge_ptep_maybe_writable, and handle the VM_WRITE check internally.


Then, here you'd do

if (unshare)
set_huge_ptep(vma, haddr, ptep);
else
set_huge_ptep_maybe_writable(vma, haddr, ptep);

Something like that.




/* Break COW or unshare */
huge_ptep_clear_flush(vma, haddr, ptep);
@@ -6883,6 +6878,17 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
  }
  #endif /* CONFIG_USERFAULTFD */
  
+static bool is_force_follow(struct vm_area_struct* vma, unsigned int flags,

+struct page* page) {
+   if (vma->vm_flags & VM_WRITE)
+   return false;
+
+   

Re: [PATCH net-next v9 0/7] Implement reset reason mechanism to detect

2024-04-26 Thread Eric Dumazet
On Thu, Apr 25, 2024 at 5:13 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> In production, there are so many cases about why the RST skb is sent but
> we don't have a very convenient/fast method to detect the exact underlying
> reasons.
>
> RST is implemented in two kinds: passive kind (like tcp_v4_send_reset())
> and active kind (like tcp_send_active_reset()). The former can be traced
> carefully 1) in TCP, with the help of drop reasons, which is based on
> Eric's idea[1], 2) in MPTCP, with the help of reset options defined in
> RFC 8684. The latter is relatively independent, which should be
> implemented on our own, such as active reset reasons which can not be
> replace by skb drop reason or something like this.
>
> In this series, I focus on the fundamental implement mostly about how
> the rstreason mechanism works and give the detailed passive part as an
> example, not including the active reset part. In future, we can go
> further and refine those NOT_SPECIFIED reasons.
>
> Here are some examples when tracing:
> -0   [002] ..s1.  1830.262425: tcp_send_reset: skbaddr=x
> skaddr=x src=x dest=x state=x reason=NOT_SPECIFIED
> -0   [002] ..s1.  1830.262425: tcp_send_reset: skbaddr=x
> skaddr=x src=x dest=x state=x reason=NO_SOCKET
>
> [1]
> Link: 
> https://lore.kernel.org/all/CANn89iJw8x-LqgsWOeJQQvgVg6DnL5aBRLi10QN2WBdr+X4k=w...@mail.gmail.com/

Reviewed-by: Eric Dumazet 

Thanks !



Re: [PATCH net-next v9 7/7] rstreason: make it work in trace world

2024-04-26 Thread Eric Dumazet
On Thu, Apr 25, 2024 at 5:14 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> At last, we should let it work by introducing this reset reason in
> trace world.
>
> One of the possible expected outputs is:
> ... tcp_send_reset: skbaddr=xxx skaddr=xxx src=xxx dest=xxx
> state=TCP_ESTABLISHED reason=NOT_SPECIFIED
>
> Signed-off-by: Jason Xing 
> Reviewed-by: Steven Rostedt (Google) 
> ---

Reviewed-by: Eric Dumazet 



Re: [PATCH net-next v9 6/7] mptcp: introducing a helper into active reset logic

2024-04-26 Thread Eric Dumazet
On Thu, Apr 25, 2024 at 5:14 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> Since we have mapped every mptcp reset reason definition in enum
> sk_rst_reason, introducing a new helper can cover some missing places
> where we have already set the subflow->reset_reason.
>
> Note: using SK_RST_REASON_NOT_SPECIFIED is the same as
> SK_RST_REASON_MPTCP_RST_EUNSPEC. They are both unknown. So we can convert
> it directly.
>
> Suggested-by: Paolo Abeni 
> Signed-off-by: Jason Xing 
> Reviewed-by: Matthieu Baerts (NGI0) 

Reviewed-by: Eric Dumazet 



Re: [PATCH net-next v9 5/7] mptcp: support rstreason for passive reset

2024-04-26 Thread Eric Dumazet
On Thu, Apr 25, 2024 at 5:14 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> It relies on what reset options in the skb are as rfc8684 says. Reusing
> this logic can save us much energy. This patch replaces most of the prior
> NOT_SPECIFIED reasons.
>
> Signed-off-by: Jason Xing 
> Reviewed-by: Matthieu Baerts (NGI0) 

Reviewed-by: Eric Dumazet 



Re: [PATCH net-next v9 3/7] rstreason: prepare for active reset

2024-04-26 Thread Eric Dumazet
On Thu, Apr 25, 2024 at 5:14 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> Like what we did to passive reset:
> only passing possible reset reason in each active reset path.
>
> No functional changes.
>
> Signed-off-by: Jason Xing 
> Acked-by: Matthieu Baerts (NGI0) 

Reviewed-by: Eric Dumazet 



Re: [PATCH net-next v9 2/7] rstreason: prepare for passive reset

2024-04-26 Thread Eric Dumazet
On Thu, Apr 25, 2024 at 5:14 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> Adjust the parameter and support passing reason of reset which
> is for now NOT_SPECIFIED. No functional changes.
>
> Signed-off-by: Jason Xing 
> Acked-by: Matthieu Baerts (NGI0) 

Reviewed-by: Eric Dumazet 



Re: [PATCH net-next v9 1/7] net: introduce rstreason to detect why the RST is sent

2024-04-26 Thread Eric Dumazet
On Thu, Apr 25, 2024 at 5:13 AM Jason Xing  wrote:
>
> From: Jason Xing 
>
> Add a new standalone file for the easy future extension to support
> both active reset and passive reset in the TCP/DCCP/MPTCP protocols.
>
> This patch only does the preparations for reset reason mechanism,
> nothing else changes.
>
> The reset reasons are divided into three parts:
> 1) reuse drop reasons for passive reset in TCP
> 2) our own independent reasons which aren't relying on other reasons at all
> 3) reuse MP_TCPRST option for MPTCP
>
> The benefits of a standalone reset reason are listed here:
> 1) it can cover more than one case, such as reset reasons in MPTCP,
> active reset reasons.
> 2) people can easily/fastly understand and maintain this mechanism.
> 3) we get unified format of output with prefix stripped.
> 4) more new reset reasons are on the way
> ...
>
> I will implement the basic codes of active/passive reset reason in
> those three protocols, which are not complete for this moment. For
> passive reset part in TCP, I only introduce the NO_SOCKET common case
> which could be set as an example.
>
> After this series applied, it will have the ability to open a new
> gate to let other people contribute more reasons into it :)
>
> Signed-off-by: Jason Xing 
> Acked-by: Matthieu Baerts (NGI0) 

Reviewed-by: Eric Dumazet