date:20250506

Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown

2025-05-06 Thread Peng Fan

On Wed, Apr 09, 2025 at 02:46:10PM +0800, Peng Fan wrote:
>On Tue, Apr 08, 2025 at 10:59:58AM -0600, Mathieu Poirier wrote:
>>On Tue, 8 Apr 2025 at 09:02, Peng Fan  wrote:
>>>
>>> On Thu, Apr 03, 2025 at 10:32:39PM +0800, Peng Fan wrote:
>>> >Hi Bjorn,
>>> >
>>> >
>>> >Thanks for replying this thread.
>>> >
>>> >On Wed, Apr 02, 2025 at 08:48:58AM -0500, Bjorn Andersson wrote:
>>> >>On Wed, Apr 02, 2025 at 09:43:55AM +0800, Peng Fan wrote:
>>> >>> On Tue, Apr 01, 2025 at 10:05:03AM -0600, Mathieu Poirier wrote:
>>> >>> >On Tue, Apr 01, 2025 at 09:41:24AM +0800, Peng Fan wrote:
>>> >>...
>>> >>> >
>>> >>> >The core is already checking if @loaded_table is valid in 
>>> >>> >rproc_start(), why
>>> >>> >can't that be used instead of adding yet another check?
>>> >>>
>>> >>> Ah. I was thinking clear table_sz in rpoc_shutdown is an easy approach 
>>> >>> and
>>> >>> could benifit others in case other platforms meet similar issue in 
>>> >>> future.
>>> >>>
>>> >>
>>> >>I like the general idea of keeping things clean and avoid leaving stale
>>> >>data behind.
>>> >>
>>> >>But clearing table_sz during stop in order to hide the fact that the
>>> >>future table_ptr will contain valid data that shouldn't be used, that's
>>> >>just a bug waiting to show up again in the future.
>>> >
>>> >Agree.
>>> >
>>> >Do you need me to post a fix for
>>> >commit efdde3d73ab25ce("remoteproc: core: Clear table_sz when 
>>> >rproc_shutdown")
>>> >by clearing table_sz in rproc_fw_boot and rproc_detach as did in this v2?
>>> >
>>> >To i.MX, the above in-tree patch is ok, so all it fine, and this v2 patch
>>> >could be dropped.
>>> >
>>> >But anyway, if you prefer a follow up fix, please let me know, I
>>> >could post a patch.
>>>
>>> Hi Bjorn, Mathieu,
>>>
>>>  I will wait for one more week to see if any concerns or questions.
>>>  Please raise if you have.
>>>
>>
>>I am working with Bjorn to get your patch reverted.  Once that has
>>happened you can send another patch.

It almost one month passed, I am not sure what status now.
I have patches for i.MX95 that are pending at my local.
I will wait for one more month until 6.16 merge window close, then
post new patches. If any concern, please raise.

Regards,
Peng


>
>ok, I am fine with this.
>
>when get reverted, I need use another method to fix the issue.
>
>I posted two approaches[1], but not get you reply. Since Bjorn raised
>his concern on 1st approach, I think I need to use the 2nd approach without
>touching the core code.
>pasted here,
>"The 2nd approach is to clear rproc->table_sz and rproc->table_ptr in
>imx_rproc_parse_fw before rproc_elf_load_rsc_table.
>"
>
>Or a V3 of current patch with updated commit log.
>
>Please suggest.
>
>If you still have concern or things still not clear to you, please
>let me know.
>
>[1] https://lore.kernel.org/all/20250402014355.GA22575@nxa18884-linux/
>
>Regards,
>Peng
>
>>
>>>  If no, I suppose this thread is done and I will start my other work
>>>  regarding rproc.
>>>
>>> Thanks,
>>> Peng
>>>
>>> >
>>> >Thanks,
>>> >Peng
>>> >
>>> >>
>>> >>Regards,
>>> >>Bjorn
>>> >>
>>> >
>>

Re: [PATCH 2/3] firmware: imx: move get power mode function from scu-pd.c to misc.c

2025-05-06 Thread Peng Fan

On Mon, May 05, 2025 at 12:48:48PM -0300, Hiago De Franco wrote:
>From: Hiago De Franco 
>
>Move imx_sc_get_pd_power() from pmdomain/imx/scu-pd.c to
>firmware/imx/misc.c and rename it to imx_sc_pm_get_resource_power_mode()
>to maintain the same naming logic with other functions in misc.c.
>
>This makes the API available for other use cases. For example,
>remoteproc/imx_rproc.c can now use this function to check the power mode
>of the remote core.

Better put this patch at the first I think.

To be simple, I think just export
imx_sc_get_pd_power in drivers/pmdomain/imx/scu-pd.c.
And add the function declaration in include/linux/firmware/imx/sci.h.

Not sure Ulf or Shawn is good with it.

Regards,
Peng

>
>Signed-off-by: Hiago De Franco 
>---
> drivers/firmware/imx/misc.c   | 47 +++
> drivers/pmdomain/imx/scu-pd.c | 29 -
> include/linux/firmware/imx/svc/misc.h |  8 +
> 3 files changed, 62 insertions(+), 22 deletions(-)
>
>diff --git a/drivers/firmware/imx/misc.c b/drivers/firmware/imx/misc.c
>index d073cb3ce699..61fcb0066ec9 100644
>--- a/drivers/firmware/imx/misc.c
>+++ b/drivers/firmware/imx/misc.c
>@@ -37,6 +37,18 @@ struct imx_sc_msg_resp_misc_get_ctrl {
>   u32 val;
> } __packed __aligned(4);
> 
>+struct imx_sc_msg_req_get_resource_power_mode {
>+  struct imx_sc_rpc_msg hdr;
>+  union {
>+  struct {
>+  u16 resource;
>+  } req;
>+  struct {
>+  u8 mode;
>+  } resp;
>+  } data;
>+} __packed __aligned(4);
>+
> /*
>  * This function sets a miscellaneous control value.
>  *
>@@ -135,3 +147,38 @@ int imx_sc_pm_cpu_start(struct imx_sc_ipc *ipc, u32 
>resource,
>   return imx_scu_call_rpc(ipc, &msg, true);
> }
> EXPORT_SYMBOL(imx_sc_pm_cpu_start);
>+
>+/*
>+ * This function gets the power mode from a given @resource
>+ *
>+ * @param[in] ipc IPC handle
>+ * @param[in] resourceresource to check the power mode
>+ *
>+ * @return Returns < 0 for errors or the following for success:
>+ * IMX_SC_PM_PW_MODE_OFF  0   Power off
>+ * IMX_SC_PM_PW_MODE_STBY 1   Power in standby
>+ * IMX_SC_PM_PW_MODE_LP   2   Power in low-power
>+ * IMX_SC_PM_PW_MODE_ON   3   Power on
>+ *
>+ * These are defined under firmware/imx/svc/pm.h
>+ */
>+int imx_sc_pm_get_resource_power_mode(struct imx_sc_ipc *ipc, u32 resource)
>+{
>+  struct imx_sc_msg_req_get_resource_power_mode msg;
>+  struct imx_sc_rpc_msg *hdr = &msg.hdr;
>+  int ret;
>+
>+  hdr->ver = IMX_SC_RPC_VERSION;
>+  hdr->svc = IMX_SC_RPC_SVC_PM;
>+  hdr->func = IMX_SC_PM_FUNC_GET_RESOURCE_POWER_MODE;
>+  hdr->size = 2;
>+
>+  msg.data.req.resource = resource;
>+
>+  ret = imx_scu_call_rpc(ipc, &msg, true);
>+  if (ret)
>+  return ret;
>+
>+  return msg.data.resp.mode;
>+}
>+EXPORT_SYMBOL(imx_sc_pm_get_resource_power_mode);
>diff --git a/drivers/pmdomain/imx/scu-pd.c b/drivers/pmdomain/imx/scu-pd.c
>index 01d465d88f60..945f972e636f 100644
>--- a/drivers/pmdomain/imx/scu-pd.c
>+++ b/drivers/pmdomain/imx/scu-pd.c
>@@ -54,6 +54,7 @@
> #include 
> #include 
> #include 
>+#include 
> #include 
> #include 
> #include 
>@@ -328,27 +329,6 @@ static void imx_sc_pd_get_console_rsrc(void)
>   imx_con_rsrc = specs.args[0];
> }
> 
>-static int imx_sc_get_pd_power(struct device *dev, u32 rsrc)
>-{
>-  struct imx_sc_msg_req_get_resource_power_mode msg;
>-  struct imx_sc_rpc_msg *hdr = &msg.hdr;
>-  int ret;
>-
>-  hdr->ver = IMX_SC_RPC_VERSION;
>-  hdr->svc = IMX_SC_RPC_SVC_PM;
>-  hdr->func = IMX_SC_PM_FUNC_GET_RESOURCE_POWER_MODE;
>-  hdr->size = 2;
>-
>-  msg.data.req.resource = rsrc;
>-
>-  ret = imx_scu_call_rpc(pm_ipc_handle, &msg, true);
>-  if (ret)
>-  dev_err(dev, "failed to get power resource %d mode, ret %d\n",
>-  rsrc, ret);
>-
>-  return msg.data.resp.mode;
>-}
>-
> static int imx_sc_pd_power(struct generic_pm_domain *domain, bool power_on)
> {
>   struct imx_sc_msg_req_set_resource_power_mode msg;
>@@ -438,7 +418,12 @@ imx_scu_add_pm_domain(struct device *dev, int idx,
>   if (imx_con_rsrc == sc_pd->rsrc)
>   sc_pd->pd.flags = GENPD_FLAG_RPM_ALWAYS_ON;
> 
>-  mode = imx_sc_get_pd_power(dev, pd_ranges->rsrc + idx);
>+  mode = imx_sc_pm_get_resource_power_mode(pm_ipc_handle,
>+   pd_ranges->rsrc + idx);
>+  if (mode < 0)
>+  dev_err(dev, "failed to get power resource %d mode, ret %d\n",
>+  pd_ranges->rsrc + idx, mode);
>+
>   if (mode == IMX_SC_PM_PW_MODE_ON)
>   is_off = false;
>   else
>diff --git a/include/linux/firmware/imx/svc/misc.h 
>b/include/linux/firmware/imx/svc/misc.h
>index 760db08a67fc..376c800a88d0 100644
>--- a/include/linux/firmware/imx/svc/misc.h
>+++ b/include/linux/firmware/imx/svc/misc.h
>@@ -55,6 +55,8 @@ int imx_s

Re: [PATCH 1/3] remoteproc: imx_rproc: skip clock enable when M-core is managed by the SCU

2025-05-06 Thread Peng Fan

On Mon, May 05, 2025 at 12:48:47PM -0300, Hiago De Franco wrote:
>From: Hiago De Franco 
>
>For the i.MX8X and i.MX8 family SoCs, when the M-core is powered up
>before Linux starts (e.g., by the bootloader) and it is being managed by
>the SCU, the SCFW will not allow the kernel to enable the clock again.
>This currently causes an SCU fault reset when the M-core is up and
>running and the kernel boots, resetting the system.
>
>Therefore, add a check in the clock enable function to not execute it if
>the M-core is being managed by the SCU.
>
>This change affects only the i.MX8X and i.MX8 family SoCs, as this is
>under the IMX_RPROC_SCU_API method.

I would rewrite as below: "

For the i.MX8X and i.MX8 family SoCs, when the M-core is powered up
by the bootloader, M-core and Linux are in same SCFW(System Controller
Firmware) partition, so linux has permission to control M-core.

But when M-core is started, the SCFW will automatically enable the clock
and configure the rate, and any users that wanna to enable the clock
will get error 'LOCKED' from SCFW. So current imx_rproc.c probe function
gets failure because clk_prepare_enable returns failure. Then
the power domain of M-core is powered off when M-core is still running,
SCU(System Controller Unit) will get a fault reset, and system restarts.

To address the issue, ignore handling the clk for i.MX8X and i.MX8 M-core,
because SCFW automatically enables and configures the clock.
"

You may update if you wanna.

>
>Signed-off-by: Hiago De Franco 
>Suggested-by: Peng Fan 

-> peng@nxp.com

Thanks,
Peng

>---
> drivers/remoteproc/imx_rproc.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c
>index 74299af1d7f1..627e57a88db2 100644
>--- a/drivers/remoteproc/imx_rproc.c
>+++ b/drivers/remoteproc/imx_rproc.c
>@@ -1029,8 +1029,8 @@ static int imx_rproc_clk_enable(struct imx_rproc *priv)
>   struct device *dev = priv->dev;
>   int ret;
> 
>-  /* Remote core is not under control of Linux */
>-  if (dcfg->method == IMX_RPROC_NONE)
>+  /* Remote core is not under control of Linux or it is managed by SCU 
>API */
>+  if (dcfg->method == IMX_RPROC_NONE || dcfg->method == IMX_RPROC_SCU_API)
>   return 0;
> 
>   priv->clk = devm_clk_get(dev, NULL);
>-- 
>2.39.5
>

Re: [PATCH v2 06/19] iommufd/viommu: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID ioctl

2025-05-06 Thread Alexey Kardashevskiy





On 6/5/25 12:53, Nicolin Chen wrote:

On Mon, May 05, 2025 at 02:08:07PM -0300, Jason Gunthorpe wrote:

On Wed, Apr 30, 2025 at 12:58:47AM -0700, Nicolin Chen wrote:


... and I just hit a problem with it - this is basically guest BDFn
and it works as long as I'm hotplugging the TEE-IO VF into an SNP VM
but does not when I pass through via the QEMU cmdline - bus numbers
are not assigned yet. So I have to postpone the vdevice allocation
till run time, did I miss something here? Thanks,


I have a similar case with QEMU ARM64's VM: so vDEVICE on ARM is
allocated at runtime as well because the BDF number isn't ready
at the boot time.


Oh that's ugly then.. So you'll need to add some kind of 'modify
sid/bdf' operation I think.


But the initial vDEVICE would be still unusable. Its BDF number is
literally 0 in my case. It can't be used for SID-based invalidation
nor the reverse vSID lookup for fault injection..


The bus numbers can be reassigned at any time on the fly by the guest
by reprogramming the PCI hierarchy.


Yes. If we take some aggressive use case into account, where its
BDF number could change multiple times, I think it's natural for
VMM to simply destroy the previous vDEVICE and allocate a new one
with a new BDF number, right?


Yup, this sounds about right, recreate on the bus number change.



Thanks
Nicolin


--
Alexey

Re: [PATCH v2] kbuild: Require pahole v1.29 with GENDWARFKSYMS on X86

2025-05-06 Thread Masahiro Yamada

On Tue, Apr 8, 2025 at 8:08 AM Sami Tolvanen  wrote:
>
> With CONFIG_GENDWARFKSYMS, __gendwarfksyms_ptr variables are
> added to the kernel in EXPORT_SYMBOL() to ensure DWARF type
> information is available for exported symbols in the TUs where
> they're actually exported. These symbols are dropped when linking
> vmlinux, but dangling references to them remain in DWARF.
>
> With CONFIG_DEBUG_INFO_BTF enabled on X86, pahole versions after
> commit 47dcb534e253 ("btf_encoder: Stop indexing symbols for
> VARs") and before commit 9810758003ce ("btf_encoder: Verify 0
> address DWARF variables are in ELF section") place these symbols
> in the .data..percpu section, which results in an "Invalid
> offset" error in btf_datasec_check_meta() during boot, as all
> the variables are at zero offset and have non-zero size. If
> CONFIG_DEBUG_INFO_BTF_MODULES is enabled, this also results in a
> failure to load modules with:
>
>   failed to validate module [$module] BTF: -22
>
> As the issue occurs in pahole v1.28 and the fix was merged
> after v1.29 was released, require pahole v1.29 when
> GENDWARFKSYMS is enabled with DEBUG_INFO_BTF on X86.
>
> Reported-by: Paolo Pisati 
> Signed-off-by: Sami Tolvanen 
> ---
>

Applied to linux-kbuild.

Thanks.



> Changes in v2:
> - Also allow pahole  - Update comment to include the affected commit range.
>
> ---
>  kernel/module/Kconfig | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
> index d7762ef5949a..39278737bb68 100644
> --- a/kernel/module/Kconfig
> +++ b/kernel/module/Kconfig
> @@ -192,6 +192,11 @@ config GENDWARFKSYMS
> depends on !DEBUG_INFO_REDUCED && !DEBUG_INFO_SPLIT
> # Requires ELF object files.
> depends on !LTO
> +   # To avoid conflicts with the discarded __gendwarfksyms_ptr symbols on
> +   # X86, requires pahole before commit 47dcb534e253 ("btf_encoder: Stop
> +   # indexing symbols for VARs") or after commit 9810758003ce 
> ("btf_encoder:
> +   # Verify 0 address DWARF variables are in ELF section").
> +   depends on !X86 || !DEBUG_INFO_BTF || PAHOLE_VERSION < 128 || 
> PAHOLE_VERSION > 129
> help
>   Calculate symbol versions from DWARF debugging information using
>   gendwarfksyms. Requires DEBUG_INFO to be enabled.
>
> base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
> --
> 2.49.0.504.g3bcea36a83-goog
>


-- 
Best Regards
Masahiro Yamada

Re: [PATCH] rcu/nocb: Add Safe checks for access offloaded rdp

2025-05-06 Thread Z qiang

>
>
>
> On 4/30/2025 12:14 PM, Joel Fernandes wrote:
> >
> >
> > On 4/30/2025 10:57 AM, Z qiang wrote:
> >>>
> >>>
> >>>
> >>> On 4/28/2025 6:59 AM, Z qiang wrote:
> >
> > Le Mon, Apr 28, 2025 at 05:54:03PM +0800, Zqiang a écrit :
> >> For Preempt-RT kernel, when enable CONFIG_PROVE_RCU Kconfig,
> >> disable local bh in rcuc kthreads will not affect preempt_count(),
> >> this resulted in the following splat:
> >>
> >> WARNING: suspicious RCU usage
> >> kernel/rcu/tree_plugin.h:36 Unsafe read of RCU_NOCB offloaded state!
> >> stack backtrace:
> >> CPU: 0 UID: 0 PID: 22 Comm: rcuc/0
> >> Call Trace:
> >> [0.407907]  
> >> [0.407910]  dump_stack_lvl+0xbb/0xd0
> >> [0.407917]  dump_stack+0x14/0x20
> >> [0.407920]  lockdep_rcu_suspicious+0x133/0x210
> >> [0.407932]  rcu_rdp_is_offloaded+0x1c3/0x270
> >> [0.407939]  rcu_core+0x471/0x900
> >> [0.407942]  ? lockdep_hardirqs_on+0xd5/0x160
> >> [0.407954]  rcu_cpu_kthread+0x25f/0x870
> >> [0.407959]  ? __pfx_rcu_cpu_kthread+0x10/0x10
> >> [0.407966]  smpboot_thread_fn+0x34c/0xa50
> >> [0.407970]  ? trace_preempt_on+0x54/0x120
> >> [0.407977]  ? __pfx_smpboot_thread_fn+0x10/0x10
> >> [0.407982]  kthread+0x40e/0x840
> >> [0.407990]  ? __pfx_kthread+0x10/0x10
> >> [0.407994]  ? rt_spin_unlock+0x4e/0xb0
> >> [0.407997]  ? rt_spin_unlock+0x4e/0xb0
> >> [0.408000]  ? __pfx_kthread+0x10/0x10
> >> [0.408006]  ? __pfx_kthread+0x10/0x10
> >> [0.408011]  ret_from_fork+0x40/0x70
> >> [0.408013]  ? __pfx_kthread+0x10/0x10
> >> [0.408018]  ret_from_fork_asm+0x1a/0x30
> >> [0.408042]  
> >>
> >> Currently, triggering an rdp offloaded state change need the
> >> corresponding rdp's CPU goes offline, and at this time the rcuc
> >> kthreads has already in parking state. this means the corresponding
> >> rcuc kthreads can safely read offloaded state of rdp while it's
> >> corresponding cpu is online.
> >>
> >> This commit therefore add rdp->rcu_cpu_kthread_task check for
> >> Preempt-RT kernels.
> >>
> >> Signed-off-by: Zqiang 
> >> ---
> >>  kernel/rcu/tree_plugin.h | 4 +++-
> >>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >> index 003e549f6514..fe728eded36e 100644
> >> --- a/kernel/rcu/tree_plugin.h
> >> +++ b/kernel/rcu/tree_plugin.h
> >> @@ -31,7 +31,9 @@ static bool rcu_rdp_is_offloaded(struct rcu_data 
> >> *rdp)
> >> lockdep_is_held(&rcu_state.nocb_mutex) ||
> >> (!(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible()) 
> >> &&
> >>  rdp == this_cpu_ptr(&rcu_data)) ||
> >> -   rcu_current_is_nocb_kthread(rdp)),
> >> +   rcu_current_is_nocb_kthread(rdp) ||
> >> +   (IS_ENABLED(CONFIG_PREEMPT_RT) &&
> >> +current == rdp->rcu_cpu_kthread_task)),
> >
> > Isn't it safe also on !CONFIG_PREEMPT_RT ?
> 
>  For !CONFIG_PREEMPT_RT and  in rcuc kthreads, it's also safe,
>  but the following check will passed :
> 
>  (!(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible()) &&
>    rdp == this_cpu_ptr(&rcu_data))
> >>>
> >>> I think the fact that it already passes for !PREEMPT_RT does not matter, 
> >>> because
> >>> it simplifies the code so drop the PREEMPT_RT check?
> >>>
> >>> Or will softirq_count() not work? It appears to have special casing for
> >>> PREEMPT_RT's local_bh_disable():
> >>>
> >>> (   ( !(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible()) || 
> >>> softirq_count() )
> >>>&& rdp == this_cpu_ptr(&rcu_data))  )
> >>
> >> Thank you for Joel's reply,  I also willing to accept such
> >> modifications and resend :) .
> > Thanks, I am Ok with either approach whichever you and Frederic together 
> > decide.
> > I can then pull this in for the v6.16 merge window once you resend, thanks!
> >
>
> Frederic, there are a couple of ways we can move forward hear. Does the
> softirq_count() approach sound good to you? If yes, I can fixup the patch 
> myself.

Hello, Joel

If you send a patch to fix it, I'd be happy,  you can add me as the
Reported-by ;)

Thanks
Zqiang

>
> I am also Ok at this point to take it in for 6.16, though I've also stored it 
> in
> my rcu/dev branch for Neeraj's 6.17 PR, just in case :)
>
>  - Joel
>
>

Re: [PATCH 2/7] rust: kunit: support checked `-> Result`s in KUnit `#[test]`s

2025-05-06 Thread David Gow

On Tue, 6 May 2025 at 03:34, Boqun Feng  wrote:
>
> On Mon, May 05, 2025 at 02:02:09PM +0800, David Gow wrote:
> > On Sat, 3 May 2025 at 05:51, Miguel Ojeda  wrote:
> > >
> > > Currently, return values of KUnit `#[test]` functions are ignored.
> > >
> > > Thus introduce support for `-> Result` functions by checking their
> > > returned values.
> > >
> > > At the same time, require that test functions return `()` or `Result > > E>`, which should avoid mistakes, especially with non-`#[must_use]`
> > > types. Other types can be supported in the future if needed.
> > >
> > > With this, a failing test like:
> > >
> > > #[test]
> > > fn my_test() -> Result {
> > > f()?;
> > > Ok(())
> > > }
> > >
> > > will output:
> > >
> > > [3.744214] KTAP version 1
> > > [3.744287] # Subtest: my_test_suite
> > > [3.744378] # speed: normal
> > > [3.744399] 1..1
> > > [3.745817] # my_test: ASSERTION FAILED at 
> > > rust/kernel/lib.rs:321
> > > [3.745817] Expected is_test_result_ok(my_test()) to be true, 
> > > but is false
> > > [3.747152] # my_test.speed: normal
> > > [3.747199] not ok 1 my_test
> > > [3.747345] not ok 4 my_test_suite
> > >
> > > Signed-off-by: Miguel Ojeda 
> > > ---
> >
> > This is a neat idea. I think a lot of tests will still want to go with
> > the () return value, but having both as an option seems like a good
> > idea to me.
> >
>
> Handling the return value of a test is definitely a good thing, but I'm
> not sure returning `Err` should be treated as assertion failure
> though. Considering:
>
> #[test]
> fn foo() -> Result {
> let b = KBox::new(...)?; // need to allocate memory for the test
> use_box(b);
> assert!(...);
> }
>
> if the test runs without enough memory, then KBox::new() would return a
> Err(ENOMEM), and technically, it's not a test failure rather the test
> has been skipped because of no enough resource.
>
> I would suggest we handle the `Err` returns specially (mark as "SKIPPED"
> maybe?).
>
> Thoughts?
>
> Regards,
> Boqun
>

FWIW, having out-of-memory situations trigger a test failure is
consistent with what other KUnit tests (written in C) do.

There's both advantages and disadvantages to this: on the one hand,
it's prone to false positives (as you mention), on the other, it
catches cases where the test is using an unusually large amount of
memory (which could indeed be a test issues).

My go-to rule is that tests should fail if small allocations (which,
in the normal course of events, _should_ succeed) fail, but if they
have unusual resource requirements (beyond "enough memory for the
system to run normally") these should be checked separately when the
test starts.

That being said, I definitely think that, by default, an `Err` return
should map to a FAILED test result, as not all Err Results are a
resource exhaustion issue, and we definitely don't want to mark a test
as skipped if there's a real error occurring. If test authors wish to
skip a test when an out-of-memory condition occurs, they probably
should handle that explicitly. (But I'd not be opposed to helpers to
make it easier.)

Cheers,
-- David

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PATCH v2 3/4] media: i2c: imx214: Make use of CCS PLL calculator

2025-05-06 Thread Sakari Ailus

Hi André,

A few more comments below.

On Mon, May 05, 2025 at 11:05:55PM +0200, André Apitzsch via B4 Relay wrote:
> From: André Apitzsch 
> 
> Calculate PLL parameters based on clock frequency and link frequency.
> 
> Acked-by: Ricardo Ribalda 
> Signed-off-by: André Apitzsch 
> ---
>  drivers/media/i2c/Kconfig  |   1 +
>  drivers/media/i2c/imx214.c | 216 
> +
>  2 files changed, 178 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/media/i2c/Kconfig b/drivers/media/i2c/Kconfig
> index 
> e576b213084d232e90b7e556a7a855a3bb95544c..c8e24c42e0c4ea169f1b6cdc4f2631234a51c7d9
>  100644
> --- a/drivers/media/i2c/Kconfig
> +++ b/drivers/media/i2c/Kconfig
> @@ -141,6 +141,7 @@ config VIDEO_IMX214
>   depends on GPIOLIB
>   select REGMAP_I2C
>   select V4L2_CCI_I2C
> + select VIDEO_CCS_PLL
>   help
> This is a Video4Linux2 sensor driver for the Sony
> IMX214 camera.
> diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c
> index 
> 3aca6ebb02d649c1b7f0b6a6049c1e3aa3d08951..9e9be47394ec768a5b34d44b06b5bbb0988da5a1
>  100644
> --- a/drivers/media/i2c/imx214.c
> +++ b/drivers/media/i2c/imx214.c
> @@ -20,6 +20,8 @@
>  #include 
>  #include 
>  
> +#include "ccs-pll.h"
> +
>  /* Chip ID */
>  #define IMX214_REG_CHIP_ID   CCI_REG16(0x0016)
>  #define IMX214_CHIP_ID   0x0214
> @@ -34,7 +36,6 @@
>  #define IMX214_DEFAULT_LINK_FREQ 6
>  /* Keep wrong link frequency for backward compatibility */
>  #define IMX214_DEFAULT_LINK_FREQ_LEGACY  48000
> -#define IMX214_DEFAULT_PIXEL_RATE ((IMX214_DEFAULT_LINK_FREQ * 8LL) / 10)
>  #define IMX214_FPS 30
>  
>  /* V-TIMING internal */
> @@ -84,6 +85,7 @@
>  #define IMX214_CSI_DATA_FORMAT_RAW10 0x0A0A
>  #define IMX214_CSI_DATA_FORMAT_COMP6 0x0A06
>  #define IMX214_CSI_DATA_FORMAT_COMP8 0x0A08
> +#define IMX214_BITS_PER_PIXEL_MASK   0xFF
>  
>  #define IMX214_REG_CSI_LANE_MODE CCI_REG8(0x0114)
>  #define IMX214_CSI_2_LANE_MODE   1
> @@ -249,6 +251,10 @@ struct imx214 {
>   struct clk *xclk;
>   struct regmap *regmap;
>  
> + struct ccs_pll pll;
> +
> + struct v4l2_fwnode_endpoint bus_cfg;
> +
>   struct v4l2_subdev sd;
>   struct media_pad pad;
>  
> @@ -758,16 +764,22 @@ static int imx214_configure_pll(struct imx214 *imx214)
>  {
>   int ret = 0;
>  
> - cci_write(imx214->regmap, IMX214_REG_VTPXCK_DIV, 5, &ret);
> - cci_write(imx214->regmap, IMX214_REG_VTSYCK_DIV, 2, &ret);
> - cci_write(imx214->regmap, IMX214_REG_PREPLLCK_VT_DIV, 3, &ret);
> - cci_write(imx214->regmap, IMX214_REG_PLL_VT_MPY, 150, &ret);
> - cci_write(imx214->regmap, IMX214_REG_OPPXCK_DIV, 10, &ret);
> - cci_write(imx214->regmap, IMX214_REG_OPSYCK_DIV, 1, &ret);
> + cci_write(imx214->regmap, IMX214_REG_VTPXCK_DIV,
> +   imx214->pll.vt_bk.pix_clk_div, &ret);
> + cci_write(imx214->regmap, IMX214_REG_VTSYCK_DIV,
> +   imx214->pll.vt_bk.sys_clk_div, &ret);
> + cci_write(imx214->regmap, IMX214_REG_PREPLLCK_VT_DIV,
> +   imx214->pll.vt_fr.pre_pll_clk_div, &ret);
> + cci_write(imx214->regmap, IMX214_REG_PLL_VT_MPY,
> +   imx214->pll.vt_fr.pll_multiplier, &ret);
> + cci_write(imx214->regmap, IMX214_REG_OPPXCK_DIV,
> +   imx214->pll.op_bk.pix_clk_div, &ret);
> + cci_write(imx214->regmap, IMX214_REG_OPSYCK_DIV,
> +   imx214->pll.op_bk.sys_clk_div, &ret);
>   cci_write(imx214->regmap, IMX214_REG_PLL_MULT_DRIV,
> IMX214_PLL_SINGLE, &ret);
>   cci_write(imx214->regmap, IMX214_REG_EXCK_FREQ,
> -   IMX214_EXCK_FREQ(IMX214_DEFAULT_CLK_FREQ / 100), &ret);
> +   IMX214_EXCK_FREQ(imx214->pll.ext_clk_freq_hz / 100), 
> &ret);
>  
>   return ret;
>  }
> @@ -872,9 +884,6 @@ static const struct v4l2_ctrl_ops imx214_ctrl_ops = {
>  
>  static int imx214_ctrls_init(struct imx214 *imx214)
>  {
> - static const s64 link_freq[] = {
> - IMX214_DEFAULT_LINK_FREQ
> - };
>   static const struct v4l2_area unit_size = {
>   .width = 1120,
>   .height = 1120,
> @@ -895,15 +904,14 @@ static int imx214_ctrls_init(struct imx214 *imx214)
>   if (ret)
>   return ret;
>  
> - imx214->pixel_rate = v4l2_ctrl_new_std(ctrl_hdlr, NULL,
> -V4L2_CID_PIXEL_RATE, 0,
> -IMX214_DEFAULT_PIXEL_RATE, 1,
> -IMX214_DEFAULT_PIXEL_RATE);
> + imx214->pixel_rate =
> + v4l2_ctrl_new_std(ctrl_hdlr, NULL, V4L2_CID_PIXEL_RATE, 1,
> +   INT_MAX, 1, 1);
>  
>   imx214->link_freq = v4l2_ctrl_new_int_menu(ctrl_hdlr, NULL,
>  V4L2_CID_LINK_FREQ,
> -ARRAY_SIZE(link_freq) - 1,
> -

Re: [PATCH v2 4/4] media: i2c: imx214: Remove hard-coded external clock frequency

2025-05-06 Thread Sakari Ailus

Hi André,

On Mon, May 05, 2025 at 11:05:56PM +0200, André Apitzsch via B4 Relay wrote:
> From: André Apitzsch 
> 
> Instead rely on the rate set on the clock (using assigned-clock-rates
> etc.)
> 
> Signed-off-by: André Apitzsch 
> ---
>  drivers/media/i2c/imx214.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c
> index 
> 9e9be47394ec768a5b34d44b06b5bbb0988da5a1..c12996e294dccebb18c608254f1e0d14dc064423
>  100644
> --- a/drivers/media/i2c/imx214.c
> +++ b/drivers/media/i2c/imx214.c
> @@ -32,7 +32,6 @@
>  
>  #define IMX214_REG_FAST_STANDBY_CTRL CCI_REG8(0x0106)
>  
> -#define IMX214_DEFAULT_CLK_FREQ  2400
>  #define IMX214_DEFAULT_LINK_FREQ 6
>  /* Keep wrong link frequency for backward compatibility */
>  #define IMX214_DEFAULT_LINK_FREQ_LEGACY  48000
> @@ -1405,11 +1404,6 @@ static int imx214_probe(struct i2c_client *client)
>   return dev_err_probe(dev, PTR_ERR(imx214->xclk),
>"failed to get xclk\n");
>  
> - ret = clk_set_rate(imx214->xclk, IMX214_DEFAULT_CLK_FREQ);
> - if (ret)
> - return dev_err_probe(dev, ret,
> -  "failed to set xclk frequency\n");
> -

Oops. I missed this is what the driver was doing already. Indeed, this is
one of the historic sensor drivers that do set the frequency in DT systems.

The driver never used the clock-frequency property and instead used a fixed
frequency. Changing the behaviour now could be problematic.

There are options here that I think we could do:

1) use your v1 patch (4) which uses "clock-frequency" if it exists and
otherwise uses the default, fixed frequency or

2) set the frequency only if the "clock-frequency" property exists. The DT
currently requires clock-frequency and the YAML conversion was done in 2020
whereas the driver is from 2018. If we do this, the clock-frequency should
be deprecated (or even removed from bingings).

I wonder what others think. Cc'd Laurent in any case.

>   ret = imx214_get_regulators(dev, imx214);
>   if (ret < 0)
>   return dev_err_probe(dev, ret, "failed to get regulators\n");
> 

-- 
Regards,

Sakari Ailus

Re: [PATCH v11 2/3] rust: add parameter support to the `module!` macro

2025-05-06 Thread Alice Ryhl

On Mon, May 05, 2025 at 11:55:33AM +0200, Andreas Hindborg wrote:
> "Alice Ryhl"  writes:
> 
> > On Fri, May 02, 2025 at 02:16:35PM +0200, Andreas Hindborg wrote:
> > It would be a use-after-free to
> > access it during module teardown. For example, what if I access this
> > static during its own destructor? Or during the destructor of another
> > module parameter?
> 
> Yes, that is a problem.
> 
> We can get around it for now by just not calling `free` for now. We only
> support simple types that do not need drop. I think we would have to
> seal the `ModuleParam` trait for this.
> 
> For a proper solution, we could
>  - Require a token to read the parameter.
>  - Synchronize on a module private field and return an option from the
>parameter getter. This would require module exit to run before param
>free. I think this is the case, but I did not check.
>  - Use a `Revocable` and revoke the parameter in `free`.
> 
> Any other ideas or comments on the outlined solutions?

I think the simplest you can do right now is

trait ModuleParam: Copy

so that it can't contain any non-trivial values. That way you don't need
Drop either.

Long term, I think we need a way to detect whether it's safe to access
module globals. The exact same problem applies to the existing global
for the module itself - except it's worse there because we can't access
that one during init either.

Alice

Re: [PATCH] selftests/run_kselftest.sh: Use readlink if realpath is not available

2025-05-06 Thread Yosry Ahmed

On Fri, Mar 28, 2025 at 02:05:43PM -0600, Shuah Khan wrote:
> On 3/18/25 10:05, Yosry Ahmed wrote:
> > 'realpath' is not always available,  fallback to 'readlink -f' if is not
> > available. They seem to work equally well in this context.
> 
> Can you add more specifics on "realpath" is not always available,"
> 
> No issues with the patch itself. I would like to know the cases
> where "realpath" command is missing.

I think I already responded but I can't find my response, anyway:

Not all distros have realpath. In my case, it was an internal distro we
use on some test machines, so I can't really share much details about
it.

Are there any other questions that I can help answer about this patch?

> 
> 
> > 
> > Signed-off-by: Yosry Ahmed 
> > ---
> >   tools/testing/selftests/run_kselftest.sh | 9 -
> >   1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/selftests/run_kselftest.sh 
> > b/tools/testing/selftests/run_kselftest.sh
> > index 50e03eefe7ac7..0443beacf3621 100755
> > --- a/tools/testing/selftests/run_kselftest.sh
> > +++ b/tools/testing/selftests/run_kselftest.sh
> > @@ -3,7 +3,14 @@
> >   #
> >   # Run installed kselftest tests.
> >   #
> > -BASE_DIR=$(realpath $(dirname $0))
> > +
> > +# Fallback to readlink if realpath is not available
> > +if which realpath > /dev/null; then
> > +BASE_DIR=$(realpath $(dirname $0))
> > +else
> > +BASE_DIR=$(readlink -f $(dirname $0))
> > +fi
> > +
> >   cd $BASE_DIR
> >   TESTS="$BASE_DIR"/kselftest-list.txt
> >   if [ ! -r "$TESTS" ] ; then
> 
> thanks,
> -- Shuah
>

[PATCH 4/6] x86/msr: Move MSR trace calls one function level up

2025-05-06 Thread Juergen Gross

In order to prepare paravirt inlining of the MSR access instructions
move the calls of MSR trace functions one function level up.

Introduce helpers {read|write}_msr[_safe]() helpers allowing to have
common definitions in msr.h doing the trace calls.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/msr.h  | 102 
 arch/x86/include/asm/paravirt.h |  38 +++-
 2 files changed, 73 insertions(+), 67 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index a9ce56fc8785..3a94cffb6a3e 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -103,14 +103,7 @@ static __always_inline u64 native_rdmsrq(u32 msr)
 
 static inline u64 native_read_msr(u32 msr)
 {
-   u64 val;
-
-   val = __rdmsr(msr);
-
-   if (tracepoint_enabled(read_msr))
-   do_trace_read_msr(msr, val, 0);
-
-   return val;
+   return __rdmsr(msr);
 }
 
 static inline int native_read_msr_safe(u32 msr, u64 *p)
@@ -123,8 +116,6 @@ static inline int native_read_msr_safe(u32 msr, u64 *p)
 _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
 : [err] "=r" (err), EAX_EDX_RET(val, low, high)
 : "c" (msr));
-   if (tracepoint_enabled(read_msr))
-   do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), err);
 
*p = EAX_EDX_VAL(val, low, high);
 
@@ -135,9 +126,6 @@ static inline int native_read_msr_safe(u32 msr, u64 *p)
 static inline void notrace native_write_msr(u32 msr, u64 val)
 {
native_wrmsrq(msr, val);
-
-   if (tracepoint_enabled(write_msr))
-   do_trace_write_msr(msr, val, 0);
 }
 
 /* Can be uninlined because referenced by paravirt */
@@ -151,8 +139,6 @@ static inline int notrace native_write_msr_safe(u32 msr, 
u64 val)
 : [err] "=a" (err)
 : "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
 : "memory");
-   if (tracepoint_enabled(write_msr))
-   do_trace_write_msr(msr, val, err);
return err;
 }
 
@@ -173,59 +159,96 @@ static inline u64 native_read_pmc(int counter)
 #include 
 #else
 #include 
+static __always_inline u64 read_msr(u32 msr)
+{
+   return native_read_msr(msr);
+}
+
+static __always_inline int read_msr_safe(u32 msr, u64 *p)
+{
+   return native_read_msr_safe(msr, p);
+}
+
+static __always_inline void write_msr(u32 msr, u64 val)
+{
+   native_write_msr(msr, val);
+}
+
+static __always_inline int write_msr_safe(u32 msr, u64 val)
+{
+   return native_write_msr_safe(msr, val);
+}
+
+static __always_inline u64 rdpmc(int counter)
+{
+   return native_read_pmc(counter);
+}
+
+#endif /* !CONFIG_PARAVIRT_XXL */
+
 /*
  * Access to machine-specific registers (available on 586 and better only)
  * Note: the rd* operations modify the parameters directly (without using
  * pointer indirection), this allows gcc to optimize better
  */
 
+#define rdmsrq(msr, val)   \
+do {   \
+   (val) = read_msr(msr);  \
+   if (tracepoint_enabled(read_msr))   \
+   do_trace_read_msr(msr, val, 0); \
+} while (0)
+
 #define rdmsr(msr, low, high)  \
 do {   \
-   u64 __val = native_read_msr((msr)); \
+   u64 __val;  \
+   rdmsrq(msr, __val); \
(void)((low) = (u32)__val); \
(void)((high) = (u32)(__val >> 32));\
 } while (0)
 
-static inline void wrmsr(u32 msr, u32 low, u32 high)
+/* rdmsr with exception handling */
+static inline int rdmsrq_safe(u32 msr, u64 *p)
 {
-   native_write_msr(msr, (u64)high << 32 | low);
-}
+   int err;
 
-#define rdmsrq(msr, val)   \
-   ((val) = native_read_msr((msr)))
+   err = read_msr_safe(msr, p);
 
-static inline void wrmsrq(u32 msr, u64 val)
-{
-   native_write_msr(msr, val);
-}
+   if (tracepoint_enabled(read_msr))
+   do_trace_read_msr(msr, *p, err);
 
-/* wrmsr with exception handling */
-static inline int wrmsrq_safe(u32 msr, u64 val)
-{
-   return native_write_msr_safe(msr, val);
+   return err;
 }
 
-/* rdmsr with exception handling */
 #define rdmsr_safe(msr, low, high) \
 ({ \
u64 __val;  \
-   int __err = native_read_msr_safe((msr), &__val);\
+   int __err = rdmsrq_safe((msr), &__val); \
(*low) = (u32)__val;\
(*high) = (u32)(__val >> 32);   \
__err;  \
 })
 
-static

RE: [PATCH v2 06/19] iommufd/viommu: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID ioctl

2025-05-06 Thread Tian, Kevin

> From: Nicolin Chen 
> Sent: Tuesday, May 6, 2025 10:54 AM
> 
> On Mon, May 05, 2025 at 02:08:07PM -0300, Jason Gunthorpe wrote:
> > On Wed, Apr 30, 2025 at 12:58:47AM -0700, Nicolin Chen wrote:
> >
> > > > ... and I just hit a problem with it - this is basically guest BDFn
> > > > and it works as long as I'm hotplugging the TEE-IO VF into an SNP VM
> > > > but does not when I pass through via the QEMU cmdline - bus numbers
> > > > are not assigned yet. So I have to postpone the vdevice allocation
> > > > till run time, did I miss something here? Thanks,
> > >
> > > I have a similar case with QEMU ARM64's VM: so vDEVICE on ARM is
> > > allocated at runtime as well because the BDF number isn't ready
> > > at the boot time.
> >
> > Oh that's ugly then.. So you'll need to add some kind of 'modify
> > sid/bdf' operation I think.
> 
> But the initial vDEVICE would be still unusable. Its BDF number is
> literally 0 in my case. It can't be used for SID-based invalidation
> nor the reverse vSID lookup for fault injection..
> 

Agree that it makes more sense to allocate vDEVICE at runtime until
it's assigned with a valid bus number. and destroy/recreation is
required when the bus number is changed.

Re: [PATCH 3/3] remoteproc: imx_rproc: add power mode check for remote core attachment

2025-05-06 Thread Peng Fan

On Mon, May 05, 2025 at 12:48:49PM -0300, Hiago De Franco wrote:
>From: Hiago De Franco 
>
>When the remote core is started before Linux boots (e.g., by the
>bootloader), the driver currently is not able to attach because it only
>checks for cores running in different partitions. If the core was kicked
>by the bootloader, it is in the same partition as Linux and it is
>already up and running.
>
>This adds power mode verification through the SCU interface, enabling
>the driver to detect when the remote core is already running and
>properly attach to it.
>
>Signed-off-by: Hiago De Franco 
>Suggested-by: Peng Fan 
>---
> drivers/remoteproc/imx_rproc.c | 23 +++
> 1 file changed, 23 insertions(+)
>
>diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c
>index 627e57a88db2..86541d9d8640 100644
>--- a/drivers/remoteproc/imx_rproc.c
>+++ b/drivers/remoteproc/imx_rproc.c
>@@ -8,6 +8,7 @@
> #include 
> #include 
> #include 
>+#include 

Drop this line.

> #include 
> #include 
> #include 
>@@ -906,6 +907,21 @@ static int imx_rproc_attach_pd(struct imx_rproc *priv)
>   return ret < 0 ? ret : 0;
> }
> 
>+static bool imx_rproc_is_on(struct device *dev, struct imx_sc_ipc *ipc,
>+  u32 resource)
>+{
>+  int ret;
>+
>+  ret = imx_sc_pm_get_resource_power_mode(ipc, resource);
>+  if (ret < 0) {
>+  dev_err(dev, "failed to get power resource %d mode, ret %d\n",
>+  resource, ret);
>+  return false;
>+  }
>+
>+  return ret == IMX_SC_PM_PW_MODE_ON;
>+}
>+
> static int imx_rproc_detect_mode(struct imx_rproc *priv)
> {
>   struct regmap_config config = { .name = "imx-rproc" };
>@@ -949,6 +965,13 @@ static int imx_rproc_detect_mode(struct imx_rproc *priv)
>   if (of_property_read_u32(dev->of_node, 
> "fsl,entry-address", &priv->entry))
>   return -EINVAL;
> 
>+  /*
>+   * If remote core is already running (e.g. kicked by
>+   * the bootloader), attach to it.
>+   */
>+  if (imx_rproc_is_on(dev, priv->ipc_handle, 
>priv->rsrc_id))

I prefer "xyz == IMX_SC_PM_PW_MODE_ON"
But anyway, up to you. I not have strong opinion on this.

Regards,
Peng

>+  priv->rproc->state = RPROC_DETACHED;
>+
>   return imx_rproc_attach_pd(priv);
>   }
> 
>-- 
>2.39.5
>

Re: [PATCH v4 14/14] HACK: selftests/nolibc: demonstrate usage of the kselftest harness

2025-05-06 Thread Mark Brown

On Mon, May 05, 2025 at 05:15:32PM +0200, Thomas Weißschuh wrote:

> Show how to use the kselftest harness together with nolibc.
> This just runs the existing harness selftest by crudely replacing the
> regular nolibc-test.c with the harness-selftest.c to get that wired up easily.
> To use it:
>   $ cd tools/testing/selftests/nolibc/
>   $ ./run-tests -m user
> 
> In the future nolibc-test can use the harness for itself.

FWIW there's also several arm64 tests which are written using nolibc
which could usefully be converted to this, they're using nolibc because
they test interfaces intended to be used by libc and so want to avoid
conflicting uses.

signature.asc
Description: PGP signature

Re: [PATCH v3 0/2] Fix two memory leaks in rproc_attach()

2025-05-06 Thread Peng Fan

On Wed, Apr 30, 2025 at 05:20:41PM +0800, Xiaolei Wang wrote:
>In the rproc_attach() function, if rproc_handle_resources() returns
>failure, the resources requested in imx_rproc_prepare() should be
>released, since almost the same thing is done in imx_rproc_prepare() and
>rproc_resource_cleanup(), Function rproc_resource_cleanup() is able
>to deal with empty lists so it is better to fix the "goto" statements
>in rproc_attach(). replace the "unprepare_device" goto statement with
>"clean_up_resources" and get rid of the "unprepare_device" label.
>and rproc->clean_table should also be released
>
>Changes in v3:
>Update patch1, replace the "unprepare_device" goto statement with
>"clean_up_resources" and get rid of the "unprepare_device" label.
>
>V2:
>Updated the commit log of these two patches
>
> https://patchwork.kernel.org/project/linux-remoteproc/patch/20250426065348.1234391-2-xiaolei.w...@windriver.com/
>
> https://patchwork.kernel.org/project/linux-remoteproc/patch/20250426065348.1234391-3-xiaolei.w...@windriver.com/
>
>V1: 
>https://patchwork.kernel.org/project/linux-remoteproc/patch/20250424122252.2777363-1-xiaolei.w...@windriver.com/
>
> https://patchwork.kernel.org/project/linux-remoteproc/patch/20250424122252.2777363-2-xiaolei.w...@windriver.com/
>
>Xiaolei Wang (2):
>  remoteproc: cleanup acquired resources when rproc_handle_resources()
>fails in rproc_attach()
>  remoteproc: core: release rproc->clean_table after rproc_attach()
>fails
>
> drivers/remoteproc/remoteproc_core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Peng Fan

Re: [PATCH] KVM: selftests: add test for SVE host corruption

2025-05-06 Thread Marc Zyngier

On Thu, 17 Apr 2025 00:32:49 +0100, Mark Brown wrote:
> This test program, originally written by Mark Rutland and lightly modified
> by me for upstream, verifies that we do not have the issues with host SVE
> state being discarded which were fixed in
> 
>fbc7e61195e2 ("KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME 
> state")
> 
> by running a simple VM while checking the SVE register state for
> corruption.
> 
> [...]

Applied to kvm-arm64/misc-6.16, thanks!

[1/1] KVM: selftests: add test for SVE host corruption
  commit: e0ccc45b056d626d4b271820faeedf3837337ceb

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.

Re: [PATCH v2 3/4] gendwarfksyms: Add a kABI rule to override type strings

2025-05-06 Thread Petr Pavlu

On 5/5/25 23:24, Sami Tolvanen wrote:
> In rare situations where distributions must make significant
> changes to otherwise opaque data structures that have
> inadvertently been included in the published ABI, keeping
> symbol versions stable using the existing kABI macros can
> become tedious.
> 
> For example, Android decided to switch to a newer io_uring
> implementation in the 5.10 GKI kernel "to resolve a huge number
> of potential, and known, problems with the codebase," requiring
> "horrible hacks" with genksyms:
> 
>   "A number of the io_uring structures get used in other core
>   kernel structures, only as "opaque" pointers, so there is
>   not any real ABI breakage.  But, due to the visibility of
>   the structures going away, the CRC values of many scheduler
>   variables and functions were changed."
> -- https://r.android.com/2425293
> 
> While these specific changes probably could have been hidden
> from gendwarfksyms using the existing kABI macros, this may not
> always be the case.
> 
> Add a last resort kABI rule that allows distribution
> maintainers to fully override a type string for a symbol or a
> type. Also add a more informative error message in case we find
> a non-existent type references when calculating versions.
> 
> Suggested-by: Giuliano Procida 
> Signed-off-by: Sami Tolvanen 

Reviewed-by: Petr Pavlu 

-- 
Thanks,
Petr

Re: [PATCH v3 3/3] selftests: pidfd: add tests for PIDFD_SELF_*

2025-05-06 Thread Lorenzo Stoakes

On Mon, May 05, 2025 at 03:35:13PM +0200, Christian Brauner wrote:
> I'm completely lost as to what's happening here or whether the test here
> is somehow at fault for something.
>
> The pidfd.h head explicitly has no dependency on the pidfd uapi header
> itself and I will NAK anything that makes it so. It's just a giant pain.

There was a debate in my series here about my having to make things work with
'make headers', but thanks to John we resolved it the sane way by using the
tools/ stuff.

I _believe_ Peter is just using this thread as an example of a recent case of
people being asked to do this insanity (correct me if I'm wrong Peter) and it's
unrelated to pidfs in general.

Re: [PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

2025-05-06 Thread Stefano Garzarella


On Thu, May 01, 2025 at 10:05:24AM +0200, Michal Luczaj wrote:

There was an issue with SO_LINGER: instead of blocking until all queued
messages for the socket have been successfully sent (or the linger timeout
has been reached), close() would block until packets were handled by the
peer.


This is a new behaviour that only new kernels will follow, so I think
it is better to add a new test instead of extending a pre-existing test
that we described as "SOCK_STREAM SO_LINGER null-ptr-deref".

The old test should continue to check the null-ptr-deref also for old
kernels, while the new test will check the new behaviour, so we can skip
the new test while testing an old kernel.

Thanks,
Stefano



Add a check to alert on close() lingering when it should not.

Signed-off-by: Michal Luczaj 
---
tools/testing/vsock/vsock_test.c | 30 +++---
1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 
d0f6d253ac72d08a957cb81a3c38fcc72bec5a53..82d0bc20dfa75041f04eada1b4310be2f7c3a0c1
 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -1788,13 +1788,16 @@ static void test_stream_connect_retry_server(const 
struct test_opts *opts)
close(fd);
}

+#defineLINGER_TIMEOUT  1   /* seconds */
+
static void test_stream_linger_client(const struct test_opts *opts)
{
struct linger optval = {
.l_onoff = 1,
-   .l_linger = 1
+   .l_linger = LINGER_TIMEOUT
};
-   int fd;
+   int bytes_unsent, fd;
+   time_t ts;

fd = vsock_stream_connect(opts->peer_cid, opts->peer_port);
if (fd < 0) {
@@ -1807,7 +1810,28 @@ static void test_stream_linger_client(const struct 
test_opts *opts)
exit(EXIT_FAILURE);
}

+   /* Byte left unread to expose any incorrect behaviour. */
+   send_byte(fd, 1, 0);
+
+   /* Reuse LINGER_TIMEOUT to wait for bytes_unsent == 0. */
+   timeout_begin(LINGER_TIMEOUT);
+   do {
+   if (ioctl(fd, SIOCOUTQ, &bytes_unsent) < 0) {
+   perror("ioctl(SIOCOUTQ)");
+   exit(EXIT_FAILURE);
+   }
+   timeout_check("ioctl(SIOCOUTQ) == 0");
+   } while (bytes_unsent != 0);
+   timeout_end();
+
+   ts = current_nsec();
close(fd);
+   if ((current_nsec() - ts) / NSEC_PER_SEC > 0) {
+   fprintf(stderr, "Unexpected lingering on close()\n");
+   exit(EXIT_FAILURE);
+   }
+
+   control_writeln("DONE");
}

static void test_stream_linger_server(const struct test_opts *opts)
@@ -1820,7 +1844,7 @@ static void test_stream_linger_server(const struct 
test_opts *opts)
exit(EXIT_FAILURE);
}

-   vsock_wait_remote_close(fd);
+   control_expectln("DONE");
close(fd);
}


--
2.49.0

Re: [PATCH net-next v4 1/3] vsock/virtio: Linger on unsent data

2025-05-06 Thread Stefano Garzarella


On Thu, May 01, 2025 at 10:05:22AM +0200, Michal Luczaj wrote:

Currently vsock's lingering effectively boils down to waiting (or timing
out) until packets are consumed or dropped by the peer; be it by receiving
the data, closing or shutting down the connection.

To align with the semantics described in the SO_LINGER section of man
socket(7) and to mimic AF_INET's behaviour more closely, change the logic
of a lingering close(): instead of waiting for all data to be handled,
block until data is considered sent from the vsock's transport point of
view. That is until worker picks the packets for processing and decrements
virtio_vsock_sock::bytes_unsent down to 0.

Note that (some interpretation of) lingering was always limited to
transports that called virtio_transport_wait_close() on transport release.
This does not change, i.e. under Hyper-V and VMCI no lingering would be
observed.

The implementation does not adhere strictly to man page's interpretation of
SO_LINGER: shutdown() will not trigger the lingering. This follows AF_INET.

Signed-off-by: Michal Luczaj 
---
net/vmw_vsock/virtio_transport_common.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)


Reviewed-by: Stefano Garzarella 



diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index 
7f7de6d8809655fe522749fbbc9025df71f071bd..045ac53f69735e1979162aea8c9ab5961407640c
 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1196,12 +1196,14 @@ static void virtio_transport_wait_close(struct sock 
*sk, long timeout)
{
if (timeout) {
DEFINE_WAIT_FUNC(wait, woken_wake_function);
+   struct vsock_sock *vsk = vsock_sk(sk);

add_wait_queue(sk_sleep(sk), &wait);

do {
if (sk_wait_event(sk, &timeout,
- sock_flag(sk, SOCK_DONE), &wait))
+ virtio_transport_unsent_bytes(vsk) == 
0,
+ &wait))
break;
} while (!signal_pending(current) && timeout);


--
2.49.0

[PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces

2025-05-06 Thread Juergen Gross

Instead of having callback functions for rdmsr/wrmsr on native, switch
to inline the respective instructions directly in order to avoid
overhead with the call interface.

This requires to use the instruction interfaces for rdmsr/wrmsr
emulation when running as a Xen PV guest.

In order to prepare support for the immediate forms of RDMSR and WRMSR
when not running as a Xen PV guest, use the RDMSR and WRMSR
instructions as the fallback case instead of ALT_CALL_INSTR.

Note that in the Xen PV case the RDMSR/WRMSR patching must not happen
even as an intermediate step, as this would clobber the indirect call
information needed when patching in the direct call for the Xen case.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/paravirt.h   | 114 +-
 arch/x86/include/asm/paravirt_types.h |  13 ++-
 arch/x86/include/asm/qspinlock_paravirt.h |   5 +-
 arch/x86/kernel/paravirt.c|  26 -
 arch/x86/xen/enlighten_pv.c   |  56 ---
 5 files changed, 167 insertions(+), 47 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a463c747c780..df10b0e4f7b8 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -175,24 +175,72 @@ static inline void __write_cr4(unsigned long x)
PVOP_VCALL1(cpu.write_cr4, x);
 }
 
-static inline u64 paravirt_read_msr(u32 msr)
+static __always_inline u64 paravirt_read_msr(u32 msr)
 {
-   return PVOP_CALL1(u64, cpu.read_msr, msr);
+   EAX_EDX_DECLARE_ARGS(val, low, high);
+
+   PVOP_TEST_NULL(cpu.read_msr);
+   asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
+   "rdmsr", ALT_NOT_XEN,
+   ALT_CALL_INSTR, ALT_XENPV_CALL)
+"2:\n"
+_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
+: EAX_EDX_RET(val, low, high), ASM_CALL_CONSTRAINT
+: paravirt_ptr(cpu.read_msr), "c" (msr));
+
+   return EAX_EDX_VAL(val, low, high);
 }
 
-static inline void paravirt_write_msr(u32 msr, u64 val)
+static __always_inline void paravirt_write_msr(u32 msr, u64 val)
 {
-   PVOP_VCALL2(cpu.write_msr, msr, val);
+   PVOP_TEST_NULL(cpu.write_msr);
+   asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
+   "wrmsr", ALT_NOT_XEN,
+   ALT_CALL_INSTR, ALT_XENPV_CALL)
+ "2:\n"
+ _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
+ : ASM_CALL_CONSTRAINT
+ : paravirt_ptr(cpu.write_msr),
+ "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32))
+ : "memory");
 }
 
-static inline int paravirt_read_msr_safe(u32 msr, u64 *val)
+static __always_inline int paravirt_read_msr_safe(u32 msr, u64 *p)
 {
-   return PVOP_CALL2(int, cpu.read_msr_safe, msr, val);
+   int err;
+   EAX_EDX_DECLARE_ARGS(val, low, high);
+
+   PVOP_TEST_NULL(cpu.read_msr_safe);
+   asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
+   "rdmsr; xor %[err],%[err]", ALT_NOT_XEN,
+   ALT_CALL_INSTR, ALT_XENPV_CALL)
+"2:\n"
+_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
+: [err] "=c" (err), EAX_EDX_RET(val, low, high),
+  ASM_CALL_CONSTRAINT
+: paravirt_ptr(cpu.read_msr_safe), "0" (msr));
+
+   *p = EAX_EDX_VAL(val, low, high);
+
+   return err;
 }
 
-static inline int paravirt_write_msr_safe(u32 msr, u64 val)
+static __always_inline int paravirt_write_msr_safe(u32 msr, u64 val)
 {
-   return PVOP_CALL2(int, cpu.write_msr_safe, msr, val);
+   int err;
+
+   PVOP_TEST_NULL(cpu.write_msr_safe);
+   asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
+   "wrmsr; xor %[err],%[err]", ALT_NOT_XEN,
+   ALT_CALL_INSTR, ALT_XENPV_CALL)
+"2:\n"
+_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %[err])
+: [err] "=a" (err), ASM_CALL_CONSTRAINT
+: paravirt_ptr(cpu.write_msr_safe),
+  "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
+: "memory");
+
+   return err;
 }
 
 static __always_inline u64 read_msr(u32 msr)
@@ -573,27 +621,43 @@ bool __raw_callee_save___native_vcpu_is_preempted(long 
cpu);
 #define PV_SAVE_ALL_CALLER_REGS"pushl %ecx;"
 #define PV_RESTORE_ALL_CALLER_REGS "popl  %ecx;"
 #else
+/* save and restore caller-save registers, except %rax, %rcx and %rdx. */
+#define PV_SAVE_COMMON_CALLER_REGS \
+   "push %rsi;"\
+   "push %rdi;"\
+   "push %r8;" \
+   "push %r9;"

Re: [PATCH net-next v4 2/3] vsock: Move lingering logic to af_vsock core

2025-05-06 Thread Stefano Garzarella


On Thu, May 01, 2025 at 10:05:23AM +0200, Michal Luczaj wrote:

Lingering should be transport-independent in the long run. In preparation
for supporting other transports, as well the linger on shutdown(), move
code to core.

Generalize by querying vsock_transport::unsent_bytes(), guard against the
callback being unimplemented. Do not pass sk_lingertime explicitly. Pull
SOCK_LINGER check into vsock_linger().

Flatten the function. Remove the nested block by inverting the condition:
return early on !timeout.

Suggested-by: Stefano Garzarella 
Signed-off-by: Michal Luczaj 
---
include/net/af_vsock.h  |  1 +
net/vmw_vsock/af_vsock.c| 30 ++
net/vmw_vsock/virtio_transport_common.c | 23 ++-
3 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 
9e85424c834353d016a527070dd62e15ff3bfce1..d56e6e135158939087d060dfcf65d3fdaea53bf3
 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -221,6 +221,7 @@ void vsock_for_each_connected_socket(struct vsock_transport 
*transport,
 void (*fn)(struct sock *sk));
int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock 
*psk);

bool vsock_find_cid(unsigned int cid);
+void vsock_linger(struct sock *sk);

/ TAP /

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 
fc6afbc8d6806a4d98c66abc3af4bd139c583b08..a31ad6b141cd38d1806df4b5d417924bb8607e32
 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1013,6 +1013,36 @@ static int vsock_getname(struct socket *sock,
return err;
}

+void vsock_linger(struct sock *sk)
+{
+   DEFINE_WAIT_FUNC(wait, woken_wake_function);
+   ssize_t (*unsent)(struct vsock_sock *vsk);
+   struct vsock_sock *vsk = vsock_sk(sk);
+   long timeout;
+
+   if (!sock_flag(sk, SOCK_LINGER))
+   return;
+
+   timeout = sk->sk_lingertime;
+   if (!timeout)
+   return;
+
+   /* unsent_bytes() may be unimplemented. */


This comment IMO should be enriched, as it is now it doesn't add much to 
the code. I'm thinking on something like this:

Transports must implement `unsent_bytes` if they want to support
SOCK_LINGER through `vsock_linger()` since we use it to check when
the socket can be closed.

The rest LGTM!

Thanks,
Stefano


+   unsent = vsk->transport->unsent_bytes;
+   if (!unsent)
+   return;
+
+   add_wait_queue(sk_sleep(sk), &wait);
+
+   do {
+   if (sk_wait_event(sk, &timeout, unsent(vsk) == 0, &wait))
+   break;
+   } while (!signal_pending(current) && timeout);
+
+   remove_wait_queue(sk_sleep(sk), &wait);
+}
+EXPORT_SYMBOL_GPL(vsock_linger);
+
static int vsock_shutdown(struct socket *sock, int mode)
{
int err;
diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index 
045ac53f69735e1979162aea8c9ab5961407640c..aa308f285bf1bcf4c689407033de854c6f85a639 
100644

--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1192,25 +1192,6 @@ static void virtio_transport_remove_sock(struct 
vsock_sock *vsk)
vsock_remove_sock(vsk);
}

-static void virtio_transport_wait_close(struct sock *sk, long timeout)
-{
-   if (timeout) {
-   DEFINE_WAIT_FUNC(wait, woken_wake_function);
-   struct vsock_sock *vsk = vsock_sk(sk);
-
-   add_wait_queue(sk_sleep(sk), &wait);
-
-   do {
-   if (sk_wait_event(sk, &timeout,
- virtio_transport_unsent_bytes(vsk) == 
0,
- &wait))
-   break;
-   } while (!signal_pending(current) && timeout);
-
-   remove_wait_queue(sk_sleep(sk), &wait);
-   }
-}
-
static void virtio_transport_cancel_close_work(struct vsock_sock *vsk,
   bool cancel_timeout)
{
@@ -1280,8 +1261,8 @@ static bool virtio_transport_close(struct vsock_sock *vsk)
if ((sk->sk_shutdown & SHUTDOWN_MASK) != SHUTDOWN_MASK)
(void)virtio_transport_shutdown(vsk, SHUTDOWN_MASK);

-   if (sock_flag(sk, SOCK_LINGER) && !(current->flags & PF_EXITING))
-   virtio_transport_wait_close(sk, sk->sk_lingertime);
+   if (!(current->flags & PF_EXITING))
+   vsock_linger(sk);

if (sock_flag(sk, SOCK_DONE)) {
return true;

--
2.49.0

Re: [PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

2025-05-06 Thread Stefano Garzarella

On Tue, 6 May 2025 at 11:43, Stefano Garzarella  wrote:
>
> On Thu, May 01, 2025 at 10:05:24AM +0200, Michal Luczaj wrote:
> >There was an issue with SO_LINGER: instead of blocking until all queued
> >messages for the socket have been successfully sent (or the linger timeout
> >has been reached), close() would block until packets were handled by the
> >peer.
>
> This is a new behaviour that only new kernels will follow, so I think
> it is better to add a new test instead of extending a pre-existing test
> that we described as "SOCK_STREAM SO_LINGER null-ptr-deref".
>
> The old test should continue to check the null-ptr-deref also for old
> kernels, while the new test will check the new behaviour, so we can skip
> the new test while testing an old kernel.

I also saw that we don't have any test to verify that actually the
lingering is working, should we add it since we are touching it?

Thanks,
Stefano

Re: [BUG] RCU: Kernel panic on execute from non-executable memory (binderfs + stress-ng)

2025-05-06 Thread Yunseong Kim

Hi,

I attached the vmcore with vmlinux symbol for further analysis and will share 
it at the following link.

Link: 
https://drive.google.com/file/d/1_RFdpdWNuLdO-Yx6d7vIX-WAFX4X_msH/view?usp=drive_link

On 5/6/25 9:30 오전, Yunseong Kim wrote:
> Hi Colin,
> 
>>> The crash seems to originate from rcu_do_batch(), jumping to a pointer 
>>> (0x3a114000) that appears to be non-executable.
>>> The PTE for the address confirms XN=1. Given the heavy binderfs workload, I 
>>> suspect there may be a use-after-free or dangling pointer involved in a 
>>> callback invocation.
>>>
>>> Platform:
>>>   Architecture: arm64
>>>   Virtualized environment: Apple Silicon M2 (Apple Virtualization Framework)
>>>   Kernel version: 6.15.0-rc4+
>>>   Attached Config: CONFIG_PREEMPT_VOLUNTARY=y, CONFIG_KASAN=y
>>>
>>> Reproducer:
>>>   sudo ./stress-ng --binderfs 8 --binderfs-ops 1 -t 15 \
>>>    --pathological --timestamp --tz --syslog --perf --no-rand-seed \
>>>    --times --metrics --klog-check --status 5 -x smi -v --interrupts 
>>> --change-cpu
>>
>>
>> I suspect --change-cpu is required to trigger this issue. Does it trigger 
>> without this option?  Can you reproduce the issue when reducing the number 
>> of --binderfs intances?
> 
> As you suggested, I've been testing combinations of enabling and disabling 
> '--binderfs' and '--change-cpu' separately.
> 
> While I'm not deeply familiar with the internal mechanisms of binderfs,
> I found that the panic still occurs consistently with --binderfs, even 
> without the --change-cpu option.
> However, the panic occurred when I used 4 instances.
> 
> Reproducer with reduced instance from 8 to 4 and without change-cpu option:
>  sudo ./stress-ng --binderfs 4 --binderfs-ops 1 -t 15 \
>   --pathological --timestamp --tz --syslog --perf --no-rand-seed \
>   --times --metrics --klog-check --status 5 -x smi -v --interrupts

[  194.911021] Unable to handle kernel execute from non-executable memory at 
virtual address 312ebe00
[  194.911043] Mem abort info:
[  194.911056]   ESR = 0x860f
[  194.911065]   EC = 0x21: IABT (current EL), IL = 32 bits
[  194.98]   SET = 0, FnV = 0
[  194.911130]   EA = 0, S1PTW = 0
[  194.911139]   FSC = 0x0f: level 3 permission fault
[  194.911149] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000163388000
[  194.911160] [312ebe00] pgd=18016403, p4d=18016403, 
pud=18016fffe403, pmd=18016fff4403, pte=0068712eb707
[  194.911201] Internal error: Oops: 860f [#1]  SMP
[  194.911211] Modules linked in: overlay isofs uinput snd_seq_dummy 
snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet 
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 rfkill ip_set nf_tables qrtr sunrpc virtio_snd snd_seq 
snd_seq_device snd_pcm snd_timer snd virtio_net soundcore net_failover 
virtio_balloon failover vfat fat joydev loop nfnetlink vsock_loopback 
vmw_vsock_virtio_transport_common zram vmw_vsock_vmci_transport lz4hc_compress 
vmw_vmci lz4_compress vsock uas polyval_ce polyval_generic ghash_ce usb_storage 
sha3_ce sha512_ce virtio_gpu sha512_arm64 virtio_dma_buf apple_mfi_fastcharge 
fuse
[  194.911440] CPU: 2 UID: 0 PID: 27 Comm: ksoftirqd/2 Kdump: loaded Not 
tainted 6.15.0-rc4+ #1 PREEMPT(voluntary) 
[  194.911452] Hardware name: Apple Inc. Apple Virtualization Generic Platform, 
BIOS 2075.101.2.0.0 03/12/2025
[  194.911459] pstate: 21400805 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=-c)
[  194.911469] pc : 0x312ebe00
[  194.911492] lr : rcu_do_batch+0x2dc/0x860
[  194.911506] sp : 800080143c90
[  194.911512] x29: 800080143cb0 x28: 3059e600 x27: 312ebe00
[  194.911534] x26: 800084442000 x25:  x24: 8000843d9b18
[  194.911549] x23: 800082150ac0 x22: 0003 x21: 000a
[  194.911563] x20: c11233c0 x19: 00012f0e1e00 x18: 
[  194.911578] x17: 80008214506c x16: 002c7c955b9d7f7c x15: 
[  194.911592] x14: 0002 x13: 00ff0100 x12: 8000801c3410
[  194.911607] x11: 00180017 x10: 00ff0100 x9 : 80008385a580
[  194.911621] x8 : 00010100 x7 : 7f7f7f7f7f7f7f7f x6 : 8000803f89bc
[  194.911636] x5 :  x4 :  x3 : 0002
[  194.911650] x2 :  x1 : 800082a4aeb8 x0 : 3059e600
[  194.911744] Call trace:
[  194.911747]  0x312ebe00 (P)
[  194.911759]  rcu_core+0x2a0/0x4e8
[  194.911767]  rcu_core_si+0x1c/0x30
[  194.911773]  handle_softirqs+0x1b4/0x588
[  194.911782]  run_ksoftirqd+0x5c/0xf8
[  194.911787]  smpboot_thread_fn+0x27c/0x490
[  194.911794]  kthread+0x2ac/0x318
[  194.911802]  ret_from_fork+0x10/0x20
[  194.911811] Code:     () 
[  194.911821] SMP: stopping secondary CPUs
[  194.9

Re: [PATCH v4 00/38] Mediated vPMU 4.0 for x86

2025-05-06 Thread Mi, Dapeng

Hi Sean,

Not sure if you have bandwidth to review this mediated vPMU v4 patchset?
All your comments in v3 patchset have been addressed.

Thanks.

Dapeng Mi

On 3/25/2025 1:30 AM, Mingwei Zhang wrote:
> With joint effort from the upstream KVM community, we come up with the
> 4th version of mediated vPMU for x86. We have made the following changes
> on top of the previous RFC v3.
>
> v3 -> v4
>  - Rebase whole patchset on 6.14-rc3 base.
>  - Address Peter's comments on Perf part.
>  - Address Sean's comments on KVM part.
>* Change key word "passthrough" to "mediated" in all patches
>* Change static enabling to user space dynamic enabling via 
> KVM_CAP_PMU_CAPABILITY.
>* Only support GLOBAL_CTRL save/restore with VMCS exec_ctrl, drop the MSR
>  save/retore list support for GLOBAL_CTRL, thus the support of mediated
>  vPMU is constrained to SapphireRapids and later CPUs on Intel side.
>* Merge some small changes into a single patch.
>  - Address Sandipan's comment on invalid pmu pointer.
>  - Add back "eventsel_hw" and "fixed_ctr_ctrl_hw" to avoid to directly
>manipulate pmc->eventsel and pmu->fixed_ctr_ctrl.
>
>
> Testing (Intel side):
>  - Perf-based legacy vPMU (force emulation on/off)
>* Kselftests pmu_counters_test, pmu_event_filter_test and
>  vmx_pmu_caps_test pass.
>* KUT PMU tests pmu, pmu_lbr, pmu_pebs pass.
>* Basic perf counting/sampling tests in 3 scenarios, guest-only,
>  host-only and host-guest coexistence all pass.
>
>  - Mediated vPMU (force emulation on/off)
>* Kselftests pmu_counters_test, pmu_event_filter_test and
>  vmx_pmu_caps_test pass.
>* KUT PMU tests pmu, pmu_lbr, pmu_pebs pass.
>* Basic perf counting/sampling tests in 3 scenarios, guest-only,
>  host-only and host-guest coexistence all pass.
>
>  - Failures. All above tests passed on Intel Granite Rapids as well
>except a failure on KUT/pmu_pebs.
>* GP counter 0 (0xfffe): PEBS record (written seq 0)
>  is verified (including size, counters and cfg).
>* The pebs_data_cfg (0xb5) doesn't match with the
>  effective MSR_PEBS_DATA_CFG (0x0).
>* This failure has nothing to do with this mediated vPMU patch set. The
>  failure is caused by Granite Rapids supported timed PEBS which needs
>  extra support on Qemu and KUT/pmu_pebs. These extra support would be
>  sent in separate patches later.
>
>
> Testing (AMD side):
>  - Kselftests pmu_counters_test, pmu_event_filter_test and
>vmx_pmu_caps_test all pass
>
>  - legacy guest with KUT/pmu:
>* qmeu option: -cpu host, -perfctr-core
>* when set force_emulation_prefix=1, passes
>* when set force_emulation_prefix=0, passes
>  - perfmon-v1 guest with KUT/pmu:
>* qmeu option: -cpu host, -perfmon-v2
>* when set force_emulation_prefix=1, passes
>* when set force_emulation_prefix=0, passes
>  - perfmon-v2 guest with KUT/pmu:
>* qmeu option: -cpu host
>* when set force_emulation_prefix=1, passes
>* when set force_emulation_prefix=0, passes
>
>  - perf_fuzzer (perfmon-v2):
>* fails with soft lockup in guest in current version.
>* culprit could be between 6.13 ~ 6.14-rc3 within KVM
>* Series tested on 6.12 and 6.13 without issue.
>
> Note: a QEMU series is needed to run mediated vPMU v4:
>  - 
> https://lore.kernel.org/all/20250324123712.34096-1-dapeng1...@linux.intel.com/
>
> History:
>  - RFC v3: 
> https://lore.kernel.org/all/20240801045907.4010984-1-mizh...@google.com/
>  - RFC v2: 
> https://lore.kernel.org/all/20240506053020.3911940-1-mizh...@google.com/
>  - RFC v1: 
> https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zh...@linux.intel.com/
>
>
> Dapeng Mi (18):
>   KVM: x86/pmu: Introduce enable_mediated_pmu global parameter
>   KVM: x86/pmu: Check PMU cpuid configuration from user space
>   KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers
>   KVM: x86/pmu: Add perf_capabilities field in struct kvm_host_values{}
>   KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header
>   KVM: VMX: Add macros to wrap around
> {secondary,tertiary}_exec_controls_changebit()
>   KVM: x86/pmu: Check if mediated vPMU can intercept rdpmc
>   KVM: x86/pmu/vmx: Save/load guest IA32_PERF_GLOBAL_CTRL with
> vm_exit/entry_ctrl
>   KVM: x86/pmu: Optimize intel/amd_pmu_refresh() helpers
>   KVM: x86/pmu: Setup PMU MSRs' interception mode
>   KVM: x86/pmu: Handle PMU MSRs interception and event filtering
>   KVM: x86/pmu: Switch host/guest PMU context at vm-exit/vm-entry
>   KVM: x86/pmu: Handle emulated instruction for mediated vPMU
>   KVM: nVMX: Add macros to simplify nested MSR interception setting
>   KVM: selftests: Add mediated vPMU supported for pmu tests
>   KVM: Selftests: Support mediated vPMU for vmx_pmu_caps_test
>   KVM: Selftests: Fix pmu_counters_test error for mediated vPMU
>   KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space
>
> Kan Liang (8):
>   perf: Support get/put mediated PMU int

Re: [PATCH] kbuild: use ARCH from compile.h in unclean source tree msg

2025-05-06 Thread Nicolas Schier

On Fri, 02 May 2025, Shuah Khan wrote:

> When make finds the source tree unclean, it prints a message to run
> "make ARCH=x86_64 mrproper" message using the ARCH from the command
> line. The ARCH specified in the command line could be different from
> the ARCH of the existing build in the source tree.
> 
> This could cause problems in regular kernel build and kunit workflows.
> 
> Regular workflow:
> 
> - Build x86_64 kernel
>   $ make ARCH=x86_64
> - Try building another arch kernel out of tree with O=
>   $ make ARCH=um O=/linux/build
> - kbuild detects source tree is unclean
> 
>   ***
>   *** The source tree is not clean, please run 'make ARCH=um mrproper'
>   *** in /linux/linux_srcdir
>   ***
> 
> - Clean source tree as suggested by kbuild
>   $ make ARCH=um mrproper
> - Source clean appears to be clean, but it leaves behind generated header
>   files under arch/x86
>   arch/x86/realmode/rm/pasyms.h
> 
> A subsequent x86_64e build fails with
>   "undefined symbol sev_es_trampoline_start referenced ..."
> 
> kunit workflow runs into this issue:
> 
> - Build x86_64 kernel
> - Run kunit tests:  it tries to build for user specified ARCH or uml
>   as default:
>   $ ./tools/testing/kunit/kunit.py run
> 
> - kbuild detects unclean source tree
> 
>   ***
>   *** The source tree is not clean, please run 'make ARCH=um mrproper'
>   *** in /linux/linux_6.15
>   ***
> 
> - Clean source tree as suggested by kbuild
>   $ make ARCH=um mrproper
> - Source clean appears to be clean, but it leaves behind generated header
>   files under arch/x86
> 
> The problem shows when user tries to run tests on ARCH=x86_64:
> 
>   $ ./tools/testing/kunit/kunit.py run ARCH=x86_64
> 
>   "undefined symbol sev_es_trampoline_start referenced ..."
> 
> Build trips on arch/x86/realmode/rm/pasyms.h left behind by a prior
> x86_64 build.
> 
> Problems related to partially cleaned source tree are hard to debug.
> Change Makefile to unclean source logic to use ARCH from compile.h
> UTS_MACHINE string. With this change kbuild prints:
> 
>   $ ./tools/testing/kunit/kunit.py run
> 
>   ***
>   *** The source tree is not clean, please run 'make ARCH=x86_64 mrproper'
>   *** in /linux/linux_6.15
>   ***
> 
> Signed-off-by: Shuah Khan 
> ---
>  Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Makefile b/Makefile
> index 5aa9ee52a765..7ee29136b4da 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -674,7 +674,7 @@ ifeq ($(KBUILD_EXTMOD),)
>-d $(srctree)/include/config -o \
>-d $(srctree)/arch/$(SRCARCH)/include/generated ]; then \
>   echo >&2 "***"; \
> - echo >&2 "*** The source tree is not clean, please run 
> 'make$(if $(findstring command line, $(origin ARCH)), ARCH=$(ARCH)) 
> mrproper'"; \
> + echo >&2 "*** The source tree is not clean, please run 'make 
> ARCH=$(shell grep UTS_MACHINE $(srctree)/include/generated/compile.h | cut -d 
> '"' -f 2) mrproper'"; \

Please 'grep' option '-s'.

There are some (rare) occassions, when there is no include/generated/compile.h
but still the source tree will be considered to be dirty:

grep: ../include/generated/compile.h: No such file or directory
***
*** The source tree is not clean, please run 'make ARCH= mrproper'
...

Kind regards,
Nicolas

Re: [PATCH v3 2/5] ASoC: qcom: sm8250: set card driver name from match data

2025-05-06 Thread Luca Weiss

Hi Srini,

On Fri May 2, 2025 at 1:06 PM CEST, Srinivas Kandagatla wrote:
> On 5/1/25 15:13, Luca Weiss wrote:
>> Hi Srini,
>> 
>> Srinivas Kandagatla  schreef op 1 mei 2025 13:37:45 CEST:
>>> On Fri, Apr 25, 2025 at 10:07:26AM +0200, Luca Weiss wrote:
 Sound machine drivers for Qualcomm SoCs can be reused across multiple
 SoCs. But user space ALSA UCM files depend on the card driver name which
 should be set per board/SoC.

 Allow such customization by using driver match data as sound card driver
 name.

 Also while we're already touching these lines, sort the compatibles
 alphabetically.

 Reviewed-by: Dmitry Baryshkov 
 Reviewed-by: Neil Armstrong 
 Signed-off-by: Luca Weiss 
 ---
  sound/soc/qcom/sm8250.c | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)

 diff --git a/sound/soc/qcom/sm8250.c b/sound/soc/qcom/sm8250.c
 index 
 b70b2a5031dfbf69024666f8a1049c263efcde0a..e920b413b762c803cfcc4049f35deba828275478
  100644
 --- a/sound/soc/qcom/sm8250.c
 +++ b/sound/soc/qcom/sm8250.c
 @@ -16,7 +16,6 @@
  #include "usb_offload_utils.h"
  #include "sdw.h"
  
 -#define DRIVER_NAME   "sm8250"
  #define MI2S_BCLK_RATE1536000
  
  struct sm8250_snd_data {
 @@ -200,15 +199,15 @@ static int sm8250_platform_probe(struct 
 platform_device *pdev)
if (ret)
return ret;
  
 -  card->driver_name = DRIVER_NAME;
 +  card->driver_name = of_device_get_match_data(dev);
sm8250_add_be_ops(card);
return devm_snd_soc_register_card(dev, card);
  }
  
  static const struct of_device_id snd_sm8250_dt_match[] = {
 -  {.compatible = "qcom,sm8250-sndcard"},
 -  {.compatible = "qcom,qrb4210-rb2-sndcard"},
 -  {.compatible = "qcom,qrb5165-rb5-sndcard"},
 +  { .compatible = "qcom,qrb4210-rb2-sndcard", .data = "sm8250" },
>>>
>>> sm4250 for rb2?
>> 
>> Since this name is visible to user space and used for picking the UCM 
>> config, I don't think it's a good idea to change it.
>> 
> It is not correct to pretend that rb2 is sm8250 for ucm cases, I agree 
> previous code was
> already doing this, Good thing is that we do not have a ucm written yet for 
> RB2.
>
> Lets fix this as you are already doing this for other compatibles.

Okay, will change this in v4.

Regards
Luca

>
> --srini
>
>> Regards
>> Luca
>> 
>>>
 +  { .compatible = "qcom,qrb5165-rb5-sndcard", .data = "sm8250" },
 +  { .compatible = "qcom,sm8250-sndcard", .data = "sm8250" },
{}
  };
  

 -- 
 2.49.0

Re: [PATCH v2 06/19] iommufd/viommu: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID ioctl

2025-05-06 Thread Jason Gunthorpe

On Mon, May 05, 2025 at 07:53:44PM -0700, Nicolin Chen wrote:
> On Mon, May 05, 2025 at 02:08:07PM -0300, Jason Gunthorpe wrote:
> > On Wed, Apr 30, 2025 at 12:58:47AM -0700, Nicolin Chen wrote:
> > 
> > > > ... and I just hit a problem with it - this is basically guest BDFn
> > > > and it works as long as I'm hotplugging the TEE-IO VF into an SNP VM
> > > > but does not when I pass through via the QEMU cmdline - bus numbers
> > > > are not assigned yet. So I have to postpone the vdevice allocation
> > > > till run time, did I miss something here? Thanks,
> > > 
> > > I have a similar case with QEMU ARM64's VM: so vDEVICE on ARM is
> > > allocated at runtime as well because the BDF number isn't ready
> > > at the boot time.
> > 
> > Oh that's ugly then.. So you'll need to add some kind of 'modify
> > sid/bdf' operation I think.
> 
> But the initial vDEVICE would be still unusable. Its BDF number is
> literally 0 in my case. It can't be used for SID-based invalidation
> nor the reverse vSID lookup for fault injection..

That's fine, that is actually what it is in the vPCI topology. Until
the bus numbers are assigned at least.

So you'd have SID conflicts in the kernel, just pick the first one or
something until it gets sorted out.

> > The bus numbers can be reassigned at any time on the fly by the guest
> > by reprogramming the PCI hierarchy.
> 
> Yes. If we take some aggressive use case into account, where its
> BDF number could change multiple times, I think it's natural for
> VMM to simply destroy the previous vDEVICE and allocate a new one
> with a new BDF number, right?

We should not destroy the vdevice for something like that. In a CC
case that would unplug it from the VM which is not right.

Jason

[PATCH v12 2/3] rust: add parameter support to the `module!` macro

2025-05-06 Thread Andreas Hindborg

Add support for module parameters to the `module!` macro. Implement read
only support for integer types without `sysfs` support.

Acked-by: Petr Pavlu  # from modules perspective
Tested-by: Daniel Gomez 
Reviewed-by: Greg Kroah-Hartman 
Signed-off-by: Andreas Hindborg 
---
 rust/kernel/lib.rs   |   1 +
 rust/kernel/module_param.rs  | 204 +++
 rust/macros/helpers.rs   |  25 ++
 rust/macros/lib.rs   |  31 +++
 rust/macros/module.rs| 195 -
 samples/rust/rust_minimal.rs |  10 +++
 6 files changed, 446 insertions(+), 20 deletions(-)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index de07aadd1ff5..9afb7eae9183 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -61,6 +61,7 @@
 pub mod kunit;
 pub mod list;
 pub mod miscdevice;
+pub mod module_param;
 #[cfg(CONFIG_NET)]
 pub mod net;
 pub mod of;
diff --git a/rust/kernel/module_param.rs b/rust/kernel/module_param.rs
new file mode 100644
index ..ba80c9c87317
--- /dev/null
+++ b/rust/kernel/module_param.rs
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Support for module parameters.
+//!
+//! C header: 
[`include/linux/moduleparam.h`](srctree/include/linux/moduleparam.h)
+
+use crate::prelude::*;
+use crate::str::BStr;
+
+/// Newtype to make `bindings::kernel_param` [`Sync`].
+#[repr(transparent)]
+#[doc(hidden)]
+pub struct RacyKernelParam(pub ::kernel::bindings::kernel_param);
+
+// SAFETY: C kernel handles serializing access to this type. We never access it
+// from Rust module.
+unsafe impl Sync for RacyKernelParam {}
+
+/// Types that can be used for module parameters.
+pub trait ModuleParam: Sized + Copy {
+/// The [`ModuleParam`] will be used by the kernel module through this 
type.
+///
+/// This may differ from `Self` if, for example, `Self` needs to track
+/// ownership without exposing it or allocate extra space for other 
possible
+/// parameter values.
+// This is required to support string parameters in the future.
+type Value: ?Sized;
+
+/// Parse a parameter argument into the parameter value.
+///
+/// `Err(_)` should be returned when parsing of the argument fails.
+///
+/// Parameters passed at boot time will be set before [`kmalloc`] is
+/// available (even if the module is loaded at a later time). However, in
+/// this case, the argument buffer will be valid for the entire lifetime of
+/// the kernel. So implementations of this method which need to allocate
+/// should first check that the allocator is available (with
+/// [`crate::bindings::slab_is_available`]) and when it is not available
+/// provide an alternative implementation which doesn't allocate. In cases
+/// where the allocator is not available it is safe to save references to
+/// `arg` in `Self`, but in other cases a copy should be made.
+///
+/// [`kmalloc`]: srctree/include/linux/slab.h
+fn try_from_param_arg(arg: &'static BStr) -> Result;
+}
+
+/// Set the module parameter from a string.
+///
+/// Used to set the parameter value at kernel initialization, when loading
+/// the module or when set through `sysfs`.
+///
+/// See `struct kernel_param_ops.set`.
+///
+/// # Safety
+///
+/// - If `val` is non-null then it must point to a valid null-terminated 
string.
+///   The `arg` field of `param` must be an instance of `T`.
+/// - `param.arg` must be a pointer to valid `*mut T` as set up by the
+///   [`module!`] macro.
+///
+/// # Invariants
+///
+/// Currently, we only support read-only parameters that are not readable
+/// from `sysfs`. Thus, this function is only called at kernel
+/// initialization time, or at module load time, and we have exclusive
+/// access to the parameter for the duration of the function.
+///
+/// [`module!`]: macros::module
+unsafe extern "C" fn set_param(
+val: *const kernel::ffi::c_char,
+param: *const crate::bindings::kernel_param,
+) -> core::ffi::c_int
+where
+T: ModuleParam,
+{
+// NOTE: If we start supporting arguments without values, val _is_ allowed
+// to be null here.
+if val.is_null() {
+// TODO: Use pr_warn_once available.
+crate::pr_warn!("Null pointer passed to `module_param::set_param`");
+return EINVAL.to_errno();
+}
+
+// SAFETY: By function safety requirement, val is non-null and
+// null-terminated. By C API contract, `val` is live and valid for reads
+// for the duration of this function.
+let arg = unsafe { CStr::from_char_ptr(val) };
+
+crate::error::from_result(|| {
+let new_value = T::try_from_param_arg(arg)?;
+
+// SAFETY: `param` is guaranteed to be valid by C API contract
+// and `arg` is guaranteed to point to an instance of `T`.
+let old_value = unsafe { (*param).__bindgen_anon_1.arg as *mut T };
+
+// SAFETY: `old_value` is valid for writes, as we have exclusive
+

[PATCH v12 0/3] rust: extend `module!` macro with integer parameter support

2025-05-06 Thread Andreas Hindborg

Extend the `module!` macro with support module parameters. Also add some string
to integer parsing functions and updates `BStr` with a method to strip a string
prefix.

Based on code by Adam Bratschi-Kaye lifted from the original `rust` branch [1].

Link: 
https://github.com/Rust-for-Linux/linux/tree/bc22545f38d74473cfef3e9fd65432733435b79f
 [1]
Signed-off-by: Andreas Hindborg 
---
Changes in v12:
- Assign through pointer rather than using `core::ptr::replace`.
- Prevent a potential use-after-free during module teardown.
- Link to v11: 
https://lore.kernel.org/r/20250502-module-params-v3-v11-0-6096875a2...@kernel.org

Changes in v11:
- Apply a few nits from Miguel.
- Link to v10: 
https://lore.kernel.org/r/20250501-module-params-v3-v10-0-4da485d34...@kernel.org

Changes in v10:
- Apply fixups from Miguel:
  - Add integer type suffixes to `assert!` in tests.
  - Fix links to docs.kernel.org.
  - Applyy markdown and intra-doc links where possible.
  - Change to `///` for `mod` docs.
  - Slightly reword a comment.
  - Pluralize "Examples" section name.
  - Hide `use`s in example.
  - Removed `#[expect]` for the `rusttest` target.
- Link to v9: 
https://lore.kernel.org/r/20250321-module-params-v3-v9-0-28b905f2e...@kernel.org

Changes in v9:
- Remove UB when parsing the minimum integer values.
- Make `FromStr` trait unsafe, since wrong implementations can cause UB.
- Drop patches that were applied to rust-next.
- Link to v8: 
https://lore.kernel.org/r/20250227-module-params-v3-v8-0-c85d9...@kernel.org

Changes in v8:
- Change print statement in sample to better communicate parameter name.
- Use imperative mode in commit messages.
- Remove prefix path from `EINVAL`.
- Change `try_from_param_arg` to accept `&BStr` rather than `&[u8]`.
- Parse integers without 128 bit integer types.
- Seal trait `FromStrRadix`.
- Strengthen safety requirement of `set_param`.
- Remove comment about Display and `PAGE_SIZE`.
- Add note describing why `ModuleParamAccess` is pub.
- Typo and grammar fixes for documentation.
- Update MAINTAINERS with rust module files.
- Link to v7: 
https://lore.kernel.org/r/20250218-module-params-v3-v7-0-5e1afabca...@kernel.org

Changes in v7:
- Remove dependency on `pr_warn_once` patches, replace with TODO.
- Rework `ParseInt::from_str` to avoid allocating.
- Add a comment explaining how we parse "0".
- Change trait bound on `Index` impl for `BStr` to match std library approach.
- Link to v6: 
https://lore.kernel.org/r/20250211-module-params-v3-v6-0-24b297ddc...@kernel.org

Changes in v6:
- Fix a bug that prevented parsing of negative default values for
  parameters in the `module!` macro.
- Fix a bug that prevented parsing zero in `strip_radix`. Also add a
  test case for this.
- Add `AsRef` for `[u8]` and `BStr`.
- Use `impl AsRef` as type of prefix in `BStr::strip_prefix`.
- Link to v5: 
https://lore.kernel.org/r/20250204-module-params-v3-v5-0-bf5ec2041...@kernel.org

Changes in v5:
- Fix a typo in a safety comment in `set_param`.
- Use a match statement in `parse_int::strip_radix`.
- Add an implementation of `Index` for `BStr`.
- Fix a logic inversion bug where parameters would not be parsed.
- Use `kernel::ffi::c_char` in `set_param` rather than the one in `core`.
- Use `kernel::c_str!` rather than `c"..."` literal in module macro.
- Rebase on v6.14-rc1.
- Link to v4: 
https://lore.kernel.org/r/20250109-module-params-v3-v4-0-c208bcfbe...@kernel.org

Changes in v4:
- Add module maintainers to Cc list (sorry)
- Add a few missing [`doc_links`]
- Add panic section to `expect_string_field`
- Fix a typo in safety requirement of `module_params::free`
- Change `assert!` to `pr_warn_once!` in `module_params::set_param`
- Remove `module_params::get_param` and install null pointer instead
- Remove use of the unstable feature `sync_unsafe_cell`
- Link to v3: 
https://lore.kernel.org/r/20241213-module-params-v3-v3-0-485a015ac...@kernel.org

Changes in v3:
- use `SyncUnsafeCell` rather than `static mut` and simplify parameter access
- remove `Display` bound from `ModuleParam`
- automatically generate documentation for `PARAM_OPS_.*`
- remove `as *const _ as *mut_` phrasing
- inline parameter name in struct instantiation in  `emit_params`
- move `RacyKernelParam` out of macro template
- use C string literals rather than byte string literals with explicit null
- template out `__{name}_{param_name}` in `emit_param`
- indent template in `emit_params`
- use let-else expression in `emit_params` to get rid of an indentation level
- document `expect_string_field`
- move invication of `impl_int_module_param` to be closer to macro def
- move attributes after docs in `make_param_ops`
- rename `impl_module_param` to impl_int_module_param`
- use `ty` instead of `ident` in `impl_parse_int`
- use `BStr` instead of `&str` for string manipulation
- move string parsing functions to seperate patch and add examples, fix bugs
- degrade comment about future support from doc comment to regular comment
- remove std lib path from

[PATCH v12 1/3] rust: str: add radix prefixed integer parsing functions

2025-05-06 Thread Andreas Hindborg

Add the trait `ParseInt` for parsing string representations of integers
where the string representations are optionally prefixed by a radix
specifier. Implement the trait for the primitive integer types.

Tested-by: Daniel Gomez 
Reviewed-by: Greg Kroah-Hartman 
Signed-off-by: Andreas Hindborg 
---
 rust/kernel/str.rs | 172 -
 1 file changed, 171 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/str.rs b/rust/kernel/str.rs
index 878111cb77bc..174e70397305 100644
--- a/rust/kernel/str.rs
+++ b/rust/kernel/str.rs
@@ -573,7 +573,6 @@ macro_rules! c_str {
 }
 
 #[cfg(test)]
-#[expect(clippy::items_after_test_module)]
 mod tests {
 use super::*;
 
@@ -946,3 +945,174 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
 macro_rules! fmt {
 ($($f:tt)*) => ( core::format_args!($($f)*) )
 }
+
+/// Integer parsing functions for parsing signed and unsigned integers
+/// potentially prefixed with `0x`, `0o`, or `0b`.
+pub mod parse_int {
+use crate::prelude::*;
+use crate::str::BStr;
+use core::ops::Deref;
+
+// Make `FromStrRadix` a public type with a private name. This seals
+// `ParseInt`, that is, prevents downstream users from implementing the
+// trait.
+mod private {
+use crate::str::BStr;
+
+/// Trait that allows parsing a [`&BStr`] to an integer with a radix.
+///
+/// # Safety
+///
+/// The member functions of this trait must be implemented according to
+/// their documentation.
+///
+/// [`&BStr`]: kernel::str::BStr
+// This is required because the `from_str_radix` function on the 
primitive
+// integer types is not part of any trait.
+pub unsafe trait FromStrRadix: Sized {
+/// The minimum value this integer type can assume.
+const MIN: Self;
+
+/// Parse `src` to [`Self`] using radix `radix`.
+fn from_str_radix(src: &BStr, radix: u32) -> Result;
+
+/// Return the absolute value of [`Self::MIN`].
+fn abs_min() -> u64;
+
+/// Perform bitwise 2's complement on `self`.
+///
+/// Note: This function does not make sense for unsigned integers.
+fn complement(self) -> Self;
+}
+}
+
+/// Extract the radix from an integer literal optionally prefixed with
+/// one of `0x`, `0X`, `0o`, `0O`, `0b`, `0B`, `0`.
+fn strip_radix(src: &BStr) -> (u32, &BStr) {
+match src.deref() {
+[b'0', b'x' | b'X', rest @ ..] => (16, rest.as_ref()),
+[b'0', b'o' | b'O', rest @ ..] => (8, rest.as_ref()),
+[b'0', b'b' | b'B', rest @ ..] => (2, rest.as_ref()),
+// NOTE: We are including the leading zero to be able to parse
+// literal `0` here. If we removed it as a radix prefix, we would
+// not be able to parse `0`.
+[b'0', ..] => (8, src),
+_ => (10, src),
+}
+}
+
+/// Trait for parsing string representations of integers.
+///
+/// Strings beginning with `0x`, `0o`, or `0b` are parsed as hex, octal, or
+/// binary respectively. Strings beginning with `0` otherwise are parsed as
+/// octal. Anything else is parsed as decimal. A leading `+` or `-` is also
+/// permitted. Any string parsed by [`kstrtol()`] or [`kstrtoul()`] will be
+/// successfully parsed.
+///
+/// [`kstrtol()`]: 
https://docs.kernel.org/core-api/kernel-api.html#c.kstrtol
+/// [`kstrtoul()`]: 
https://docs.kernel.org/core-api/kernel-api.html#c.kstrtoul
+///
+/// # Examples
+///
+/// ```
+/// # use kernel::str::parse_int::ParseInt;
+/// # use kernel::b_str;
+///
+/// assert_eq!(Ok(0u8), u8::from_str(b_str!("0")));
+///
+/// assert_eq!(Ok(0xa2u8), u8::from_str(b_str!("0xa2")));
+/// assert_eq!(Ok(-0xa2i32), i32::from_str(b_str!("-0xa2")));
+///
+/// assert_eq!(Ok(-0o57i8), i8::from_str(b_str!("-0o57")));
+/// assert_eq!(Ok(0o57i8), i8::from_str(b_str!("057")));
+///
+/// assert_eq!(Ok(0b1001i16), i16::from_str(b_str!("0b1001")));
+/// assert_eq!(Ok(-0b1001i16), i16::from_str(b_str!("-0b1001")));
+///
+/// assert_eq!(Ok(127i8), i8::from_str(b_str!("127")));
+/// assert!(i8::from_str(b_str!("128")).is_err());
+/// assert_eq!(Ok(-128i8), i8::from_str(b_str!("-128")));
+/// assert!(i8::from_str(b_str!("-129")).is_err());
+/// assert_eq!(Ok(255u8), u8::from_str(b_str!("255")));
+/// assert!(u8::from_str(b_str!("256")).is_err());
+/// ```
+pub trait ParseInt: private::FromStrRadix + TryFrom {
+/// Parse a string according to the description in [`Self`].
+fn from_str(src: &BStr) -> Result {
+match src.deref() {
+[b'-', rest @ ..] => {
+let (radix, digits) = strip_radix(rest.as_ref());
+// 2's complement valu

[PATCH v12 3/3] modules: add rust modules files to MAINTAINERS

2025-05-06 Thread Andreas Hindborg

The module subsystem people agreed to maintain rust support for modules
[1]. Thus, add entries for relevant files to modules entry in MAINTAINERS.

Link: 
https://lore.kernel.org/all/0d9e596a-5316-4e00-862b-fd77552ae...@suse.com/ [1]

Acked-by: Daniel Gomez 
Signed-off-by: Andreas Hindborg 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3cbf9ac0d83f..d283874843a0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16395,6 +16395,8 @@ F:  include/linux/module*.h
 F: kernel/module/
 F: lib/test_kmod.c
 F: lib/tests/module/
+F: rust/kernel/module_param.rs
+F: rust/macros/module.rs
 F: scripts/module*
 F: tools/testing/selftests/kmod/
 F: tools/testing/selftests/module/

-- 
2.47.2

Re: [PATCH v8 00/10] Basic SEV-SNP Selftests

2025-05-06 Thread Sean Christopherson

On Mon, May 05, 2025, Pratik R. Sampat wrote:
> On 5/5/2025 6:15 PM, Sean Christopherson wrote:
> > On Mon, May 05, 2025, Pratik R. Sampat wrote:
> > Argh, now I remember the issue.  But _sev_platform_init_locked() returns 
> > '0' if
> > psp_init_on_probe is true, and I don't see how deferring 
> > __sev_snp_init_locked()
> > will magically make it succeed the second time around.
> > 
> > So shouldn't the KVM code be this?
> > 
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index e0f446922a6e..dd04f979357d 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3038,6 +3038,14 @@ void __init sev_hardware_setup(void)
> > sev_snp_supported = sev_snp_enabled && 
> > cc_platform_has(CC_ATTR_HOST_SEV_SNP);
> >  
> >  out:
> > +   if (sev_enabled) {
> > +   init_args.probe = true;
> > +   if (sev_platform_init(&init_args))
> > +   sev_supported = sev_es_supported = 
> > sev_snp_supported = false;
> > +   else
> > +   sev_snp_supported &= sev_is_snp_initialized();
> > +   }
> > +
> > if (boot_cpu_has(X86_FEATURE_SEV))
> > pr_info("SEV %s (ASIDs %u - %u)\n",
> > sev_supported ? min_sev_asid <= max_sev_asid ? 
> > "enabled" :
> > @@ -3067,12 +3075,6 @@ void __init sev_hardware_setup(void)
> >  
> > if (!sev_enabled)
> > return;
> > -
> > -   /*
> > -* Do both SNP and SEV initialization at KVM module load.
> > -*/
> > -   init_args.probe = true;
> > -   sev_platform_init(&init_args);
> >  }
> >  
> >  void sev_hardware_unsetup(void)
> > --
> > 
> 
> I agree with this approach. One thing maybe to consider further is to also 
> call
> into SEV_platform_status() to check for init so that SEV/SEV-ES is not
> penalized and disabled for SNP's failures. Another approach could be to break
> up the SEV and SNP init setup so that we can spare a couple of platform calls
> in the process?

Nah, SNP initialization failure should be a rare occurence, I don't want to make
the "normal" flow more complex just to handle something that should never happen
in practice.

[PATCH] selftests: pid_namespace: Fix missing mount headers in pid_max.c

2025-05-06 Thread Moon Hee Lee

Fix compilation error in pid_max.c by including  to define
mount(), umount2(), MS_PRIVATE, MS_REC, and MNT_DETACH.

The test now builds successfully.

Signed-off-by: Moon Hee Lee 
---
 tools/testing/selftests/pid_namespace/pid_max.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/pid_namespace/pid_max.c 
b/tools/testing/selftests/pid_namespace/pid_max.c
index 51c414faabb0..972bedc475f1 100644
--- a/tools/testing/selftests/pid_namespace/pid_max.c
+++ b/tools/testing/selftests/pid_namespace/pid_max.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../kselftest_harness.h"
 #include "../pidfd/pidfd.h"
-- 
2.43.0

Re: [RFC PATCH 02/11] KVM: guest_mem: Add ioctl KVM_LINK_GUEST_MEMFD

2025-05-06 Thread Ackerley Tng

Sean Christopherson  writes:

> On Mon, Aug 07, 2023, Ackerley Tng wrote:
>> KVM_LINK_GUEST_MEMFD will link a gmem fd's underlying inode to a new
>> file (and fd).
>>
>> Signed-off-by: Ackerley Tng 
>> ---
>>  include/uapi/linux/kvm.h |  8 +
>>  virt/kvm/guest_mem.c | 73 
>>  virt/kvm/kvm_main.c  | 10 ++
>>  virt/kvm/kvm_mm.h|  7 
>>  4 files changed, 98 insertions(+)
>>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index eb900344a054..d0e2a2ce0df2 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -2299,4 +2299,12 @@ struct kvm_create_guest_memfd {
>>  __u64 reserved[6];
>>  };
>>
>> +#define KVM_LINK_GUEST_MEMFD_IOWR(KVMIO,  0xd5, struct 
>> kvm_link_guest_memfd)
>> +
>> +struct kvm_link_guest_memfd {
>> +__u64 fd;
>> +__u64 flags;
>> +__u64 reserved[6];
>> +};
>> +
>>  #endif /* __LINUX_KVM_H */
>> diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
>> index 30d0ab8745ee..1b3df273f785 100644
>> --- a/virt/kvm/guest_mem.c
>> +++ b/virt/kvm/guest_mem.c
>> @@ -477,6 +477,79 @@ int kvm_gmem_create(struct kvm *kvm, struct 
>> kvm_create_guest_memfd *args)
>>  return __kvm_gmem_create(kvm, size, flags, kvm_gmem_mnt);
>>  }
>>
>> +static inline void __kvm_gmem_do_link(struct inode *inode)
>> +{
>> +/* Refer to simple_link() */
>> +
>> +inode->i_ctime = current_time(inode);
>> +inc_nlink(inode);
>> +
>> +/*
>> + * ihold() to add additional reference to inode for reference in dentry,
>> + * created in kvm_gmem_alloc_file() -> alloc_file_pseudo(). This is not
>> + * necessary when creating a new file because alloc_inode() creates
>> + * inodes with i_count = 1, which is the refcount for the dentry in the
>> + * file.
>> + */
>> +ihold(inode);
>> +
>> +/*
>> + * dget() and d_instantiate() complete the setup of a dentry, but those
>> + * have already been done in kvm_gmem_alloc_file() ->
>> + * alloc_file_pseudo()
>> + */
>> +}

Thanks Sean, we're just circling back to this series, working on a next
revision.

>
> Does this have to be done before the fd is exposed to userspace, or can it be
> done after?

Does "exposed to userspace" mean the call to get_unused_fd_flags(),
where an fd is reserved?

Do you mean to make this reservation as late as possible?

> If it can be done after, I'd prefer to have the allocation helper
> also install the fd, and also rename it to something that better conveys that
> it's allocating more than just the file, e.g. that it allocates and initialize
> kvm_gmem too.
>
> Completely untested, but this is what I'm thinkin/hoping.
>
> static int kvm_gmem_alloc_view(struct kvm *kvm, struct inode *inode,
>  struct vfsmount *mnt)

Will rename this kvm_gmem_alloc_view(), that naming totally makes
sense, and attaches a meaning to the struct file as a view into the
memory.

> {
>   struct file *file;
>   struct kvm_gmem *gmem;
>
>   gmem = kzalloc(sizeof(*gmem), GFP_KERNEL);
>   if (!gmem)
>   return -ENOMEM;
>
>   fd = get_unused_fd_flags(0);
>   if (fd < 0) {
>   r = fd;
>   goto err_fd;
>   }

Do you see the fd as part of the view? I thought the fd is just a handle
to the view (struct file).

>
>   file = alloc_file_pseudo(inode, mnt, "kvm-gmem", O_RDWR, 
> &kvm_gmem_fops);
>   if (IS_ERR(file)) {
>   r = PTR_ERR(file);
>   goto err_file;
>   }
>
>   file->f_flags |= O_LARGEFILE;
>   file->f_mapping = inode->i_mapping;
>
>   kvm_get_kvm(kvm);
>   gmem->kvm = kvm;
>   xa_init(&gmem->bindings);
>
>   file->private_data = gmem;
>
>   list_add(&gmem->entry, &inode->i_mapping->private_list);
>
>   fd_install(fd, file);
>
>   return 0;
> err:
>   put_unused_fd(fd);
> err_fd:
>   kfree(gmem);
>   return r;
> }
>
> static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags,
>struct vfsmount *mnt)
> {
>   const char *anon_name = "[kvm-gmem]";
>   const struct qstr qname = QSTR_INIT(anon_name, strlen(anon_name));
>   struct inode *inode;
>   struct file *file;
>   int fd, err;
>
>   inode = alloc_anon_inode(mnt->mnt_sb);
>   if (IS_ERR(inode))
>   return PTR_ERR(inode);
>
>   err = security_inode_init_security_anon(inode, &qname, NULL);
>   if (err)
>   goto err;
>
>   inode->i_private = (void *)(unsigned long)flags;
>   inode->i_op = &kvm_gmem_iops;
>   inode->i_mapping->a_ops = &kvm_gmem_aops;
>   inode->i_mode |= S_IFREG;
>   inode->i_size = size;
>   mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
>   mapping_set_large_folios(inode->i_mapping);
>   mapping_set_unevictable(inode->i_mapping);
>   mapping_set_unmovable(inode->i_mapping);
>
>   fd = kvm_gmem_alloc_view(kvm

Re: [PATCH v9 01/19] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)

2025-05-06 Thread Ira Weiny

Fan Ni wrote:
> On Mon, Apr 14, 2025 at 03:19:50PM +0100, Jonathan Cameron wrote:
> > On Sun, 13 Apr 2025 17:52:09 -0500
> > Ira Weiny  wrote:

[snip]

> > 
> > > +
> > > +static bool cxl_verify_dcd_cmds(struct cxl_memdev_state *mds, unsigned 
> > > long *cmds_seen)
> > 
> > It's not immediately obvious to me what the right behavior
> > from something called cxl_verify_dcd_cmds() is.  A comment might help with 
> > that.
> > 
> > I think all it does right now is check if any bits are set. In my head
> > it was going to check that all bits needed for a useful implementation were
> > set. I did have to go check what a 'logical and' of a bitmap was defined as
> > because that bit of the bitmap_and() return value wasn't obvious to me 
> > either!
> 
> The code only checks if any DCD command (48xx) is supported, if any is
> set, it will set "dcd_supported".
> As you mentioned, it seems we should check all the related commands are
> supported, otherwise it is not valid implementation.
> 
> Fan
> > 
> > 
> > > +{
> > > + DECLARE_BITMAP(all_cmds, CXL_DCD_ENABLED_MAX);
> > > + DECLARE_BITMAP(dst, CXL_DCD_ENABLED_MAX);
> > > +
> > > + bitmap_fill(all_cmds, CXL_DCD_ENABLED_MAX);
> > > + return bitmap_and(dst, cmds_seen, all_cmds, CXL_DCD_ENABLED_MAX);

Yea... so this should read:

...
bitmap_and(dst, cmds_seen, all_cmds, CXL_DCD_ENABLED_MAX);
return bitmap_equal(dst, all_cmds, CXL_DCD_ENABLED_MAX);
...

Of course if a device has set any of these commands true it better have
set them all.  Otherwise the device is broken and it will fail in bad
ways.

But I agree with both of you that this is much better and explicit that
something went wrong.  A dev_dbg() might be in order to debug such an
issue.

Ira

[snip]

Re: [PATCH net-next v3] selftests/vsock: add initial vmtest.sh for vsock

2025-05-06 Thread Bobby Eshleman

On Fri, May 02, 2025 at 12:22:46PM +0200, Paolo Abeni wrote:
> On 4/29/25 1:48 AM, Bobby Eshleman wrote:
> > This commit introduces a new vmtest.sh runner for vsock.
> > 
> > It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H,
> > H2G, and loopback. The testing tools from tools/testing/vsock/ are
> > reused. Currently, only vsock_test is used.
> > 
> > VMCI and hyperv support is automatically built, though not used.
> > 
> > Only tested on x86.
> > 
> > To run:
> > 
> >   $ tools/testing/selftests/vsock/vmtest.sh
> > 
> > or
> > 
> >   $ make -C tools/testing/selftests TARGETS=vsock run_tests
> > 
> > Results:
> > # linux/tools/testing/selftests/vsock/vmtest.log
> > setup:  Building kernel and tests
> > setup:  Booting up VM
> > setup:  VM booted up
> > test:vm_server_host_client:guest:   Control socket listening on 
> > 0.0.0.0:51000
> > test:vm_server_host_client:guest:   Control socket connection 
> > accepted...
> > [...]
> > test:vm_loopback:guest: 30 - SOCK_STREAM retry failed connect()...ok
> > test:vm_loopback:guest: 31 - SOCK_STREAM SO_LINGER null-ptr-deref...ok
> > test:vm_loopback:guest: 31 - SOCK_STREAM SO_LINGER null-ptr-deref...ok
> > 
> > Future work can include vsock_diag_test.
> > 
> > vmtest.sh is loosely based off of tools/testing/selftests/net/pmtu.sh,
> > which was picked out of the bag of tests I knew to work with NIPA.
> > 
> > Because vsock requires a VM to test anything other than loopback, this
> > patch adds vmtest.sh as a kselftest itself. This is different than other
> > systems that have a "vmtest.sh", where it is used as a utility script to
> > spin up a VM to run the selftests as a guest (but isn't hooked into
> > kselftest). This aspect is worth review, as I'm not aware of all of the
> > enviroments where this would run.
> 
> I think this approach is interesting, but I think it will need some
> additional more work, see below...
> 
> [...]
> 
> > diff --git a/tools/testing/selftests/vsock/settings 
> > b/tools/testing/selftests/vsock/settings
> > new file mode 100644
> > index 
> > ..e7b9417537fbc4626153b72e8f295ab4594c844b
> > --- /dev/null
> > +++ b/tools/testing/selftests/vsock/settings
> > @@ -0,0 +1 @@
> > +timeout=0
> 
> We need a reasonable, bounded runtime for nipa integration.
> 
> > diff --git a/tools/testing/selftests/vsock/vmtest.sh 
> > b/tools/testing/selftests/vsock/vmtest.sh
> > new file mode 100755
> > index 
> > ..d70b9446e531d6d20beb24ddeda2cf0a9f7e9a39
> > --- /dev/null
> > +++ b/tools/testing/selftests/vsock/vmtest.sh
> > @@ -0,0 +1,354 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# Copyright (c) 2025 Meta Platforms, Inc. and affiliates
> > +#
> > +# Dependencies:
> > +#  * virtme-ng
> > +#  * busybox-static (used by virtme-ng)
> > +#  * qemu  (used by virtme-ng)
> 
> You should probably check for such tools presence and bail out with skip
> otherwise.
> 
> > +
> > +SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
> > +KERNEL_CHECKOUT=$(realpath ${SCRIPT_DIR}/../../../..)
> 
> This is not going to work if/when the self-tests are installed in their
> own directory via `make install` in the tools/testing/selftests/
> directory, and that use case is supposed to work.
> 
> At very least you should check for the expected layout and skip otherwise.
> 
> > +QEMU=$(command -v qemu-system-$(uname -m))
> > +VERBOSE=0
> > +SKIP_BUILD=0
> > +VSOCK_TEST=${KERNEL_CHECKOUT}/tools/testing/vsock/vsock_test
> > +
> > +TEST_GUEST_PORT=51000
> > +TEST_HOST_PORT=5
> > +TEST_HOST_PORT_LISTENER=50001
> > +SSH_GUEST_PORT=22
> > +SSH_HOST_PORT=
> > +VSOCK_CID=1234
> > +WAIT_PERIOD=3
> > +WAIT_PERIOD_MAX=20
> > +
> > +QEMU_PIDFILE=/tmp/qemu.pid
> > +
> > +# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a
> > +# control port forwarded for vsock_test.  Because virtme-ng doesn't support
> > +# adding an additional port to forward to the device created from "--ssh" 
> > and
> > +# virtme-init mistakenly sets identical IPs to the ssh device and 
> > additional
> > +# devices, we instead opt out of using --ssh, add the device manually, and 
> > also
> > +# add the kernel cmdline options that virtme-init uses to setup the 
> > interface.
> > +QEMU_OPTS=""
> > +QEMU_OPTS="${QEMU_OPTS} -netdev 
> > user,id=n0,hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}"
> > +QEMU_OPTS="${QEMU_OPTS},hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}"
> > +QEMU_OPTS="${QEMU_OPTS} -device virtio-net-pci,netdev=n0"
> > +QEMU_OPTS="${QEMU_OPTS} -device vhost-vsock-pci,guest-cid=${VSOCK_CID}"
> > +QEMU_OPTS="${QEMU_OPTS} --pidfile ${QEMU_PIDFILE}"
> > +KERNEL_CMDLINE="virtme.dhcp net.ifnames=0 biosdevname=0 virtme.ssh 
> > virtme_ssh_user=$USER"
> > +
> > +LOG=${SCRIPT_DIR}/vmtest.log
> > +
> > +#  NameDescription
> > +avai

Re: [PATCH v11 2/3] rust: add parameter support to the `module!` macro

2025-05-06 Thread Andreas Hindborg

"Alice Ryhl"  writes:

> On Mon, May 05, 2025 at 11:55:33AM +0200, Andreas Hindborg wrote:
>> "Alice Ryhl"  writes:
>>
>> > On Fri, May 02, 2025 at 02:16:35PM +0200, Andreas Hindborg wrote:
>> > It would be a use-after-free to
>> > access it during module teardown. For example, what if I access this
>> > static during its own destructor? Or during the destructor of another
>> > module parameter?
>>
>> Yes, that is a problem.
>>
>> We can get around it for now by just not calling `free` for now. We only
>> support simple types that do not need drop. I think we would have to
>> seal the `ModuleParam` trait for this.
>>
>> For a proper solution, we could
>>  - Require a token to read the parameter.
>>  - Synchronize on a module private field and return an option from the
>>parameter getter. This would require module exit to run before param
>>free. I think this is the case, but I did not check.
>>  - Use a `Revocable` and revoke the parameter in `free`.
>>
>> Any other ideas or comments on the outlined solutions?
>
> I think the simplest you can do right now is
>
> trait ModuleParam: Copy

Cool 👍

>
> so that it can't contain any non-trivial values. That way you don't need
> Drop either.
>
> Long term, I think we need a way to detect whether it's safe to access
> module globals. The exact same problem applies to the existing global
> for the module itself - except it's worse there because we can't access
> that one during init either.

Yep.



Best regards,
Andreas Hindborg

Re: [PATCH 1/3] remoteproc: imx_rproc: skip clock enable when M-core is managed by the SCU

2025-05-06 Thread Hiago De Franco

Hi Peng,

On Tue, May 06, 2025 at 12:38:35PM +0800, Peng Fan wrote:
> On Mon, May 05, 2025 at 12:48:47PM -0300, Hiago De Franco wrote:
> >From: Hiago De Franco 
> >
> >For the i.MX8X and i.MX8 family SoCs, when the M-core is powered up
> >before Linux starts (e.g., by the bootloader) and it is being managed by
> >the SCU, the SCFW will not allow the kernel to enable the clock again.
> >This currently causes an SCU fault reset when the M-core is up and
> >running and the kernel boots, resetting the system.
> >
> >Therefore, add a check in the clock enable function to not execute it if
> >the M-core is being managed by the SCU.
> >
> >This change affects only the i.MX8X and i.MX8 family SoCs, as this is
> >under the IMX_RPROC_SCU_API method.
> 
> I would rewrite as below: "
> 
> For the i.MX8X and i.MX8 family SoCs, when the M-core is powered up
> by the bootloader, M-core and Linux are in same SCFW(System Controller
> Firmware) partition, so linux has permission to control M-core.
> 
> But when M-core is started, the SCFW will automatically enable the clock
> and configure the rate, and any users that wanna to enable the clock
> will get error 'LOCKED' from SCFW. So current imx_rproc.c probe function
> gets failure because clk_prepare_enable returns failure. Then
> the power domain of M-core is powered off when M-core is still running,
> SCU(System Controller Unit) will get a fault reset, and system restarts.
> 
> To address the issue, ignore handling the clk for i.MX8X and i.MX8 M-core,
> because SCFW automatically enables and configures the clock.
> "
> 
> You may update if you wanna.
> 
> >
> >Signed-off-by: Hiago De Franco 
> >Suggested-by: Peng Fan 
> 
> -> peng@nxp.com

Thanks for the review, I will update the suggestions on a v2. Meanwhile,
I will wait a little bit for other feedbacks.

> 
> Thanks,
> Peng
> 
> >---
> > drivers/remoteproc/imx_rproc.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c
> >index 74299af1d7f1..627e57a88db2 100644
> >--- a/drivers/remoteproc/imx_rproc.c
> >+++ b/drivers/remoteproc/imx_rproc.c
> >@@ -1029,8 +1029,8 @@ static int imx_rproc_clk_enable(struct imx_rproc *priv)
> > struct device *dev = priv->dev;
> > int ret;
> > 
> >-/* Remote core is not under control of Linux */
> >-if (dcfg->method == IMX_RPROC_NONE)
> >+/* Remote core is not under control of Linux or it is managed by SCU 
> >API */
> >+if (dcfg->method == IMX_RPROC_NONE || dcfg->method == IMX_RPROC_SCU_API)
> > return 0;
> > 
> > priv->clk = devm_clk_get(dev, NULL);
> >-- 
> >2.39.5
> >

Cheers,
Hiago.

Re: [PATCH 1/3] remoteproc: imx_rproc: skip clock enable when M-core is managed by the SCU

2025-05-06 Thread Mathieu Poirier

On Tue, May 06, 2025 at 09:36:19AM -0300, Hiago De Franco wrote:
> Hi Peng,
> 
> On Tue, May 06, 2025 at 12:38:35PM +0800, Peng Fan wrote:
> > On Mon, May 05, 2025 at 12:48:47PM -0300, Hiago De Franco wrote:
> > >From: Hiago De Franco 
> > >
> > >For the i.MX8X and i.MX8 family SoCs, when the M-core is powered up
> > >before Linux starts (e.g., by the bootloader) and it is being managed by
> > >the SCU, the SCFW will not allow the kernel to enable the clock again.
> > >This currently causes an SCU fault reset when the M-core is up and
> > >running and the kernel boots, resetting the system.
> > >
> > >Therefore, add a check in the clock enable function to not execute it if
> > >the M-core is being managed by the SCU.
> > >
> > >This change affects only the i.MX8X and i.MX8 family SoCs, as this is
> > >under the IMX_RPROC_SCU_API method.
> > 
> > I would rewrite as below: "
> > 
> > For the i.MX8X and i.MX8 family SoCs, when the M-core is powered up
> > by the bootloader, M-core and Linux are in same SCFW(System Controller
> > Firmware) partition, so linux has permission to control M-core.
> > 
> > But when M-core is started, the SCFW will automatically enable the clock
> > and configure the rate, and any users that wanna to enable the clock
> > will get error 'LOCKED' from SCFW. So current imx_rproc.c probe function
> > gets failure because clk_prepare_enable returns failure. Then
> > the power domain of M-core is powered off when M-core is still running,
> > SCU(System Controller Unit) will get a fault reset, and system restarts.
> > 
> > To address the issue, ignore handling the clk for i.MX8X and i.MX8 M-core,
> > because SCFW automatically enables and configures the clock.
> > "
> > 
> > You may update if you wanna.
> > 
> > >
> > >Signed-off-by: Hiago De Franco 
> > >Suggested-by: Peng Fan 
> > 
> > -> peng@nxp.com
> 
> Thanks for the review, I will update the suggestions on a v2. Meanwhile,
> I will wait a little bit for other feedbacks.
>

I suggest you go ahead with a v2 - I have a fair amount of patches to review and
my time to do so is currently very limited.

> > 
> > Thanks,
> > Peng
> > 
> > >---
> > > drivers/remoteproc/imx_rproc.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > >diff --git a/drivers/remoteproc/imx_rproc.c 
> > >b/drivers/remoteproc/imx_rproc.c
> > >index 74299af1d7f1..627e57a88db2 100644
> > >--- a/drivers/remoteproc/imx_rproc.c
> > >+++ b/drivers/remoteproc/imx_rproc.c
> > >@@ -1029,8 +1029,8 @@ static int imx_rproc_clk_enable(struct imx_rproc 
> > >*priv)
> > >   struct device *dev = priv->dev;
> > >   int ret;
> > > 
> > >-  /* Remote core is not under control of Linux */
> > >-  if (dcfg->method == IMX_RPROC_NONE)
> > >+  /* Remote core is not under control of Linux or it is managed by SCU 
> > >API */
> > >+  if (dcfg->method == IMX_RPROC_NONE || dcfg->method == IMX_RPROC_SCU_API)
> > >   return 0;
> > > 
> > >   priv->clk = devm_clk_get(dev, NULL);
> > >-- 
> > >2.39.5
> > >
> 
> Cheers,
> Hiago.

Re: [PATCH v8 00/10] Basic SEV-SNP Selftests

2025-05-06 Thread Kalra, Ashish

Hello Sean,

On 5/5/2025 7:56 PM, Sean Christopherson wrote:
> On Mon, May 05, 2025, Ashish Kalra wrote:
>> On 5/5/2025 6:15 PM, Sean Christopherson wrote:
>>> @@ -3067,12 +3075,6 @@ void __init sev_hardware_setup(void)
>>>  
>>> if (!sev_enabled)
>>> return;
>>> -
>>> -   /*
>>> -* Do both SNP and SEV initialization at KVM module load.
>>> -*/
>>> -   init_args.probe = true;
>>> -   sev_platform_init(&init_args);
>>>  }
>>>  
>>>  void sev_hardware_unsetup(void)
>>> --
>>>
>>> Ashish, what am I missing?
>>>
>>
>> As far as setting sev*_enabled is concerned, i believe they are more specific
>> to SNP/SEV/SEV-ES being enabled in the system, which is separate from
>> SEV_INIT/SNP_INIT (SNP_INIT success indicates that RMP been initialized, SNP
>> has to be already enabled via MSR_SYSCFG before SNP_INIT is called), though
>> SEV_INIT/SNP_INIT may fail but SEV/SNP support will still be enabled on the
>> system.
> 
> No, if SNP_INIT fails and has zero chance of succeeding, then SNP is *NOT*
> supported *by KVM*.  The platform may be configured to support SNP, but that
> matters not at all if KVM can't actually use the functionality.
>

Yes, i agree.
 
>> Additionally as SEV_INIT/SNP_INIT during sev_platform_init() have failed, so
>> any SEV/SEV-ES/SNP VM launch will fail as the firmware will return invalid
>> platform state as INITs have failed.
> 
> Yeah, and that's *awful* behavior for KVM.  Imagine if KVM did that for every
> feature, i.e. enumerated hardware support irrespective of KVM support.
> 
> The API is KVM_GET_SUPPORTED_CPUID, not KVM_GET_MOSTLY_SUPPORTED_CPUID.
> 
>> >From my understanding, sev*_enabled indicates the user support to
>>> enable/disable support for SEV/SEV-ES/SEV-SNP, 
> 
> Yes, and they're also used to reflect and enumerate KVM support:
> 
>   if (sev_enabled) {
>   kvm_cpu_cap_set(X86_FEATURE_SEV);
>   kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_VM);
>   }
>   if (sev_es_enabled) {
>   kvm_cpu_cap_set(X86_FEATURE_SEV_ES);
>   kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_ES_VM);
>   }
>   if (sev_snp_enabled) {
>   kvm_cpu_cap_set(X86_FEATURE_SEV_SNP);
>   kvm_caps.supported_vm_types |= BIT(KVM_X86_SNP_VM);
>   }
> 
>> as the sev*_enabled are the KVM module parameters, while sev*_supported
>> indicates if platform has that support enabled.
> 
> sev*_supported are completely irrelevant.  They are function local scratch 
> variables
> that exist so that KVM doesn't clobber userspace's inputs while computing 
> what is
> fully supported and enabled.
> 

Ok.

>> And before the SEV/SNP init support was moved to KVM from CCP module, doing
>> SEV/SNP INIT could fail but that still had KVM detecting SEV/SNP support
>> enabled, so this moving SEV/SNP init stuff to KVM module from CCP driver is
>> consistent with the previous behavior.
> 
> And one of my driving motivations for getting the initialization into KVM was 
> to
> fix that previous behavior.

Sure, i will test your patch and post it.  

Thanks,
Ashish

Re: [PATCH] rcu/nocb: Add Safe checks for access offloaded rdp

2025-05-06 Thread Joel Fernandes




> On May 6, 2025, at 2:26 AM, Z qiang  wrote:
> 
> 
>> 
>> 
>> 
>> 
>>> On 4/30/2025 12:14 PM, Joel Fernandes wrote:
>>> 
>>> 
>>> On 4/30/2025 10:57 AM, Z qiang wrote:
> 
> 
> 
> On 4/28/2025 6:59 AM, Z qiang wrote:
>>> 
>>> Le Mon, Apr 28, 2025 at 05:54:03PM +0800, Zqiang a écrit :
 For Preempt-RT kernel, when enable CONFIG_PROVE_RCU Kconfig,
 disable local bh in rcuc kthreads will not affect preempt_count(),
 this resulted in the following splat:
 
 WARNING: suspicious RCU usage
 kernel/rcu/tree_plugin.h:36 Unsafe read of RCU_NOCB offloaded state!
 stack backtrace:
 CPU: 0 UID: 0 PID: 22 Comm: rcuc/0
 Call Trace:
 [0.407907]  
 [0.407910]  dump_stack_lvl+0xbb/0xd0
 [0.407917]  dump_stack+0x14/0x20
 [0.407920]  lockdep_rcu_suspicious+0x133/0x210
 [0.407932]  rcu_rdp_is_offloaded+0x1c3/0x270
 [0.407939]  rcu_core+0x471/0x900
 [0.407942]  ? lockdep_hardirqs_on+0xd5/0x160
 [0.407954]  rcu_cpu_kthread+0x25f/0x870
 [0.407959]  ? __pfx_rcu_cpu_kthread+0x10/0x10
 [0.407966]  smpboot_thread_fn+0x34c/0xa50
 [0.407970]  ? trace_preempt_on+0x54/0x120
 [0.407977]  ? __pfx_smpboot_thread_fn+0x10/0x10
 [0.407982]  kthread+0x40e/0x840
 [0.407990]  ? __pfx_kthread+0x10/0x10
 [0.407994]  ? rt_spin_unlock+0x4e/0xb0
 [0.407997]  ? rt_spin_unlock+0x4e/0xb0
 [0.408000]  ? __pfx_kthread+0x10/0x10
 [0.408006]  ? __pfx_kthread+0x10/0x10
 [0.408011]  ret_from_fork+0x40/0x70
 [0.408013]  ? __pfx_kthread+0x10/0x10
 [0.408018]  ret_from_fork_asm+0x1a/0x30
 [0.408042]  
 
 Currently, triggering an rdp offloaded state change need the
 corresponding rdp's CPU goes offline, and at this time the rcuc
 kthreads has already in parking state. this means the corresponding
 rcuc kthreads can safely read offloaded state of rdp while it's
 corresponding cpu is online.
 
 This commit therefore add rdp->rcu_cpu_kthread_task check for
 Preempt-RT kernels.
 
 Signed-off-by: Zqiang 
 ---
 kernel/rcu/tree_plugin.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
 index 003e549f6514..fe728eded36e 100644
 --- a/kernel/rcu/tree_plugin.h
 +++ b/kernel/rcu/tree_plugin.h
 @@ -31,7 +31,9 @@ static bool rcu_rdp_is_offloaded(struct rcu_data 
 *rdp)
lockdep_is_held(&rcu_state.nocb_mutex) ||
(!(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible()) &&
 rdp == this_cpu_ptr(&rcu_data)) ||
 -   rcu_current_is_nocb_kthread(rdp)),
 +   rcu_current_is_nocb_kthread(rdp) ||
 +   (IS_ENABLED(CONFIG_PREEMPT_RT) &&
 +current == rdp->rcu_cpu_kthread_task)),
>>> 
>>> Isn't it safe also on !CONFIG_PREEMPT_RT ?
>> 
>> For !CONFIG_PREEMPT_RT and  in rcuc kthreads, it's also safe,
>> but the following check will passed :
>> 
>> (!(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible()) &&
>>  rdp == this_cpu_ptr(&rcu_data))
> 
> I think the fact that it already passes for !PREEMPT_RT does not matter, 
> because
> it simplifies the code so drop the PREEMPT_RT check?
> 
> Or will softirq_count() not work? It appears to have special casing for
> PREEMPT_RT's local_bh_disable():
> 
> (   ( !(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preemptible()) || 
> softirq_count() )
>   && rdp == this_cpu_ptr(&rcu_data))  )
 
 Thank you for Joel's reply,  I also willing to accept such
 modifications and resend :) .
>>> Thanks, I am Ok with either approach whichever you and Frederic together 
>>> decide.
>>> I can then pull this in for the v6.16 merge window once you resend, thanks!
>>> 
>> 
>> Frederic, there are a couple of ways we can move forward hear. Does the
>> softirq_count() approach sound good to you? If yes, I can fixup the patch 
>> myself.
> 
> Hello, Joel
> 
> If you send a patch to fix it, I'd be happy,  you can add me as the
> Reported-by ;)

Actually Z, could you send the patch with the suggestion above after 
appropriate testing? That way I will be more comfortable applying it for 6.16.

Sorry for any confusion, 

Thanks!

- Joel


> 
> Thanks
> Zqiang
> 
>> 
>> I am also Ok at this point to take it in for 6.16, though I've also stored 
>> it in
>> my rcu/dev branch for Neeraj's 6.17 PR, just in case :)
>> 
>> - Joel
>> 
>>

[PATCH v2] remoteproc: xlnx: avoid RPU force power down

2025-05-06 Thread Tanmay Shah

Powering off RPU using force_pwrdwn call results in system failure
if there are multiple users of that RPU node. Better mechanism is to use
request_node and release_node EEMI calls. With use of these EEMI calls,
platform management controller will take-care of powering off RPU
when there is no user.

Signed-off-by: Tanmay Shah 
---

Changes in v2:
  - Add comment on why version check is needed before calling EEMI call
to fw.

 drivers/remoteproc/xlnx_r5_remoteproc.c | 34 -
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 5aeedeaf3c41..1af89782e116 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -380,6 +380,18 @@ static int zynqmp_r5_rproc_start(struct rproc *rproc)
dev_dbg(r5_core->dev, "RPU boot addr 0x%llx from %s.", rproc->bootaddr,
bootmem == PM_RPU_BOOTMEM_HIVEC ? "OCM" : "TCM");
 
+   /* Request node before starting RPU core if new version of API is 
supported */
+   if (zynqmp_pm_feature(PM_REQUEST_NODE) > 1) {
+   ret = zynqmp_pm_request_node(r5_core->pm_domain_id,
+ZYNQMP_PM_CAPABILITY_ACCESS, 0,
+ZYNQMP_PM_REQUEST_ACK_BLOCKING);
+   if (ret < 0) {
+   dev_err(r5_core->dev, "failed to request 0x%x",
+   r5_core->pm_domain_id);
+   return ret;
+   }
+   }
+
ret = zynqmp_pm_request_wake(r5_core->pm_domain_id, 1,
 bootmem, ZYNQMP_PM_REQUEST_ACK_NO);
if (ret)
@@ -401,10 +413,30 @@ static int zynqmp_r5_rproc_stop(struct rproc *rproc)
struct zynqmp_r5_core *r5_core = rproc->priv;
int ret;
 
+   /* Use release node API to stop core if new version of API is supported 
*/
+   if (zynqmp_pm_feature(PM_RELEASE_NODE) > 1) {
+   ret = zynqmp_pm_release_node(r5_core->pm_domain_id);
+   if (ret)
+   dev_err(r5_core->dev, "failed to stop remoteproc RPU 
%d\n", ret);
+   return ret;
+   }
+
+   /*
+* Check expected version of EEMI call before calling it. This avoids
+* any error or warning prints from firmware as it is expected that fw
+* doesn't support it.
+*/
+   if (zynqmp_pm_feature(PM_FORCE_POWERDOWN) != 1) {
+   dev_dbg(r5_core->dev, "EEMI interface %d ver 1 not supported\n",
+   PM_FORCE_POWERDOWN);
+   return -EOPNOTSUPP;
+   }
+
+   /* maintain force pwr down for backward compatibility */
ret = zynqmp_pm_force_pwrdwn(r5_core->pm_domain_id,
 ZYNQMP_PM_REQUEST_ACK_BLOCKING);
if (ret)
-   dev_err(r5_core->dev, "failed to stop remoteproc RPU %d\n", 
ret);
+   dev_err(r5_core->dev, "core force power down failed\n");
 
return ret;
 }

base-commit: afc760ba751c289915fe10c12d836c31d23f6ddd
-- 
2.34.1

Re: [PATCH v3 0/2] Fix two memory leaks in rproc_attach()

2025-05-06 Thread Mathieu Poirier

On Wed, Apr 30, 2025 at 05:20:41PM +0800, Xiaolei Wang wrote:
> In the rproc_attach() function, if rproc_handle_resources() returns
> failure, the resources requested in imx_rproc_prepare() should be
> released, since almost the same thing is done in imx_rproc_prepare() and
> rproc_resource_cleanup(), Function rproc_resource_cleanup() is able
> to deal with empty lists so it is better to fix the "goto" statements
> in rproc_attach(). replace the "unprepare_device" goto statement with
> "clean_up_resources" and get rid of the "unprepare_device" label.
> and rproc->clean_table should also be released
> 
> Changes in v3:
> Update patch1, replace the "unprepare_device" goto statement with
> "clean_up_resources" and get rid of the "unprepare_device" label.
> 
> V2:
> Updated the commit log of these two patches
> 
> https://patchwork.kernel.org/project/linux-remoteproc/patch/20250426065348.1234391-2-xiaolei.w...@windriver.com/
> 
> https://patchwork.kernel.org/project/linux-remoteproc/patch/20250426065348.1234391-3-xiaolei.w...@windriver.com/
> 
> V1: 
> https://patchwork.kernel.org/project/linux-remoteproc/patch/20250424122252.2777363-1-xiaolei.w...@windriver.com/
> 
> https://patchwork.kernel.org/project/linux-remoteproc/patch/20250424122252.2777363-2-xiaolei.w...@windriver.com/
> 
> Xiaolei Wang (2):
>   remoteproc: cleanup acquired resources when rproc_handle_resources()
> fails in rproc_attach()
>   remoteproc: core: release rproc->clean_table after rproc_attach()
> fails
> 
>  drivers/remoteproc/remoteproc_core.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 

I have applied this patchset.

Thanks,
Mathieu

> -- 
> 2.25.1
>

Re: [PATCH v4 1/9] slab: add opt-in caching layer of percpu sheaves

2025-05-06 Thread Suren Baghdasaryan

On Mon, Apr 28, 2025 at 12:01 AM Vlastimil Babka  wrote:
>
> On 4/25/25 19:31, Christoph Lameter (Ampere) wrote:
> > On Fri, 25 Apr 2025, Vlastimil Babka wrote:
> >
> >> @@ -4195,7 +4793,11 @@ static __fastpath_inline void 
> >> *slab_alloc_node(struct kmem_cache *s, struct list
> >>  if (unlikely(object))
> >>  goto out;
> >>
> >> -object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);
> >> +if (s->cpu_sheaves && node == NUMA_NO_NODE)
> >> +object = alloc_from_pcs(s, gfpflags);
> >
> > The node to use is determined in __slab_alloc_node() only based on the
> > memory policy etc. NUMA_NO_NODE allocations can be redirected by memory
> > policies and this check disables it.
>
> To handle that, alloc_from_pcs() contains this:
>
> #ifdef CONFIG_NUMA
> if (static_branch_unlikely(&strict_numa)) {
> if (current->mempolicy)
> return NULL;
> }
> #endif
>
> And so there will be a fallback. It doesn't (currently) try to evaluate if
> the local node is compatible as this is before taking the local lock (and
> thus preventing migration).
>
>
> >> @@ -4653,7 +5483,10 @@ void slab_free(struct kmem_cache *s, struct slab 
> >> *slab, void *object,
> >>  memcg_slab_free_hook(s, slab, &object, 1);
> >>  alloc_tagging_slab_free_hook(s, slab, &object, 1);
> >>
> >> -if (likely(slab_free_hook(s, object, slab_want_init_on_free(s), 
> >> false)))
> >> +if (unlikely(!slab_free_hook(s, object, slab_want_init_on_free(s), 
> >> false)))
> >> +return;
> >> +
> >> +if (!s->cpu_sheaves || !free_to_pcs(s, object))
> >>  do_slab_free(s, slab, object, object, 1, addr);
> >>  }
> >
> > We free to pcs even if the object is remote?

Overall the patch LGTM but I would like to hear the answer to this
question too, please.

> >
>

Re: [PATCH v9 01/19] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)

2025-05-06 Thread Fan Ni

On Tue, May 06, 2025 at 11:09:09AM -0500, Ira Weiny wrote:
> Fan Ni wrote:
> > On Mon, Apr 14, 2025 at 03:19:50PM +0100, Jonathan Cameron wrote:
> > > On Sun, 13 Apr 2025 17:52:09 -0500
> > > Ira Weiny  wrote:
> 
> [snip]
> 
> > > 
> > > > +
> > > > +static bool cxl_verify_dcd_cmds(struct cxl_memdev_state *mds, unsigned 
> > > > long *cmds_seen)
> > > 
> > > It's not immediately obvious to me what the right behavior
> > > from something called cxl_verify_dcd_cmds() is.  A comment might help 
> > > with that.
> > > 
> > > I think all it does right now is check if any bits are set. In my head
> > > it was going to check that all bits needed for a useful implementation 
> > > were
> > > set. I did have to go check what a 'logical and' of a bitmap was defined 
> > > as
> > > because that bit of the bitmap_and() return value wasn't obvious to me 
> > > either!
> > 
> > The code only checks if any DCD command (48xx) is supported, if any is
> > set, it will set "dcd_supported".
> > As you mentioned, it seems we should check all the related commands are
> > supported, otherwise it is not valid implementation.
> > 
> > Fan
> > > 
> > > 
> > > > +{
> > > > +   DECLARE_BITMAP(all_cmds, CXL_DCD_ENABLED_MAX);
> > > > +   DECLARE_BITMAP(dst, CXL_DCD_ENABLED_MAX);
> > > > +
> > > > +   bitmap_fill(all_cmds, CXL_DCD_ENABLED_MAX);
> > > > +   return bitmap_and(dst, cmds_seen, all_cmds, 
> > > > CXL_DCD_ENABLED_MAX);
> 
> Yea... so this should read:
> 
> ...
>   bitmap_and(dst, cmds_seen, all_cmds, CXL_DCD_ENABLED_MAX);
>   return bitmap_equal(dst, all_cmds, CXL_DCD_ENABLED_MAX);
Maybe only 
return bitmap_equal(cmds_seen, all_cmds, CXL_DCD_ENABLED_MAX)?

Fan
> ...
> 
> Of course if a device has set any of these commands true it better have
> set them all.  Otherwise the device is broken and it will fail in bad
> ways.
> 
> But I agree with both of you that this is much better and explicit that
> something went wrong.  A dev_dbg() might be in order to debug such an
> issue.
> 
> Ira
> 
> [snip]

-- 
Fan Ni

Re: [RFC PATCH 39/39] KVM: guest_memfd: Dynamically split/reconstruct HugeTLB page

2025-05-06 Thread Ackerley Tng

Yan Zhao  writes:

>> > 
>> >
>> > What options does userspace have in this scenario?
>> > It can't reduce the flag to KVM_GUEST_MEMFD_HUGE_2MB. Adjusting the 
>> > gmem.pgoff
>> > isn't ideal either.
>> >
>> > What about something similar as below?
>> >
>> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> > index d2feacd14786..87c33704a748 100644
>> > --- a/virt/kvm/guest_memfd.c
>> > +++ b/virt/kvm/guest_memfd.c
>> > @@ -1842,8 +1842,16 @@ __kvm_gmem_get_pfn(struct file *file, struct 
>> > kvm_memory_slot *slot,
>> > }
>> >
>> > *pfn = folio_file_pfn(folio, index);
>> > -   if (max_order)
>> > -   *max_order = folio_order(folio);
>> > +   if (max_order) {
>> > +   int order;
>> > +
>> > +   order = folio_order(folio);
>> > +
>> > +   while (order > 0 && ((slot->base_gfn ^ slot->gmem.pgoff) & 
>> > ((1 << order) - 1)))
>> > +   order--;
>> > +
>> > +   *max_order = order;
>> > +   }
>> >
>> > *is_prepared = folio_test_uptodate(folio);
>> > return folio;
>> >
>> 
>> Vishal was wondering how this is working before guest_memfd was
>> introduced, for other backing memory like HugeTLB.
>> 
>> I then poked around and found this [1]. I will be adding a similar check
>> for any slot where kvm_slot_can_be_private(slot).
>>
>> Yan, that should work, right?
> No, I don't think the checking of ugfn [1] should work.
>
> 1. Even for slots bound to in-place-conversion guest_memfd (i.e. shared memory
> are allocated from guest_memfd), the slot->userspace_addr does not necessarily
> have the same offset as slot->gmem.pgoff. Even if we audit the offset in
> kvm_gmem_bind(), userspace could invoke munmap() and mmap() afterwards, 
> causing
> slot->userspace_addr to point to a different offset.
>
> 2. for slots bound to guest_memfd that do not support in-place-conversion,
> shared memory is allocated from a different backend. Therefore, checking
> "slot->base_gfn ^ slot->gmem.pgoff" is required for private memory. The check 
> is
> currently absent because guest_memfd supports 4K only.
>
>

Let me clarify, I meant these changes:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4b64ab3..d0dccf1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12938,6 +12938,11 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, 
unsigned long npages)
return 0;
 }
 
+static inline bool kvm_is_level_aligned(u64 value, int level)
+{
+   return IS_ALIGNED(value, KVM_PAGES_PER_HPAGE(level));
+}
+
 static int kvm_alloc_memslot_metadata(struct kvm *kvm,
  struct kvm_memory_slot *slot)
 {
@@ -12971,16 +12976,20 @@ static int kvm_alloc_memslot_metadata(struct kvm *kvm,
 
slot->arch.lpage_info[i - 1] = linfo;
 
-   if (slot->base_gfn & (KVM_PAGES_PER_HPAGE(level) - 1))
+   if (!kvm_is_level_aligned(slot->base_gfn, level))
linfo[0].disallow_lpage = 1;
-   if ((slot->base_gfn + npages) & (KVM_PAGES_PER_HPAGE(level) - 
1))
+   if (!kvm_is_level_aligned(slot->base_gfn + npages, level))
linfo[lpages - 1].disallow_lpage = 1;
ugfn = slot->userspace_addr >> PAGE_SHIFT;
/*
-* If the gfn and userspace address are not aligned wrt each
-* other, disable large page support for this slot.
+* If the gfn and userspace address are not aligned or if gfn
+* and guest_memfd offset are not aligned wrt each other,
+* disable large page support for this slot.
 */
-   if ((slot->base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 1)) 
{
+   if (!kvm_is_level_aligned(slot->base_gfn ^ ugfn, level) ||
+   (kvm_slot_can_be_private(slot) &&
+!kvm_is_level_aligned(slot->base_gfn ^ slot->gmem.pgoff,
+  level))) {
unsigned long j;
 
for (j = 0; j < lpages; ++j)

This does not rely on the ugfn check, but adds a similar check for gmem.pgoff.

I think this should take care of case (1.), for guest_memfds going to be
used for both shared and private memory. Userspace can't update
slot->userspace_addr, since guest_memfd memslots cannot be updated and
can only be deleted.

If userspace re-uses slot->userspace_addr for some other memory address
without deleting and re-adding a memslot,

+ KVM's access to memory should still be fine, since after the recent
  discussion at guest_memfd upstream call, KVM's guest faults will
  always go via fd+offset and KVM's access won't be disrupted
  there. Whatever checking done at memslot binding time will still be
  valid.
+ Host's access and other accesses (e.g. instruction emulation, which
  uses slot->userspace_addr) to guest memory will be broken, but I think
  t

Re: [PATCH v4 00/38] Mediated vPMU 4.0 for x86

2025-05-06 Thread Sean Christopherson

On Tue, May 06, 2025, Dapeng Mi wrote:
> Hi Sean,
> 
> Not sure if you have bandwidth to review this mediated vPMU v4 patchset?

I'm getting there.  I wanted to get through all the stuff I thought would likely
be ready for 6.16 as-is before moving onto the larger series.

Re: [PATCH v2 06/19] iommufd/viommu: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID ioctl

2025-05-06 Thread Nicolin Chen

On Tue, May 06, 2025 at 09:58:47AM -0300, Jason Gunthorpe wrote:
> On Mon, May 05, 2025 at 07:53:44PM -0700, Nicolin Chen wrote:
> > On Mon, May 05, 2025 at 02:08:07PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Apr 30, 2025 at 12:58:47AM -0700, Nicolin Chen wrote:
> > > The bus numbers can be reassigned at any time on the fly by the guest
> > > by reprogramming the PCI hierarchy.
> > 
> > Yes. If we take some aggressive use case into account, where its
> > BDF number could change multiple times, I think it's natural for
> > VMM to simply destroy the previous vDEVICE and allocate a new one
> > with a new BDF number, right?
> 
> We should not destroy the vdevice for something like that. In a CC
> case that would unplug it from the VM which is not right.

CC needs BDF to allocate a VDEV in the CC world. So, it cannot
allocate a VDEV with BDF=0 firstly, which is what Alex reported.

And even for a normal case that the device is migrating between
PCI buses, CC might not be able to update the BDF (which can be
a part of CC_VDEV_ALLOC instruction), even if iommufd supports
an update ioctl. So, the underlying handler for an update ioctl
is still to first destroy the VDEV and re-allocate a new one?

Thanks
Nicolin

Re: [PATCH v4 2/9] slab: add sheaf support for batching kfree_rcu() operations

2025-05-06 Thread Suren Baghdasaryan

On Fri, Apr 25, 2025 at 1:27 AM Vlastimil Babka  wrote:
>
> Extend the sheaf infrastructure for more efficient kfree_rcu() handling.
> For caches with sheaves, on each cpu maintain a rcu_free sheaf in
> addition to main and spare sheaves.
>
> kfree_rcu() operations will try to put objects on this sheaf. Once full,
> the sheaf is detached and submitted to call_rcu() with a handler that
> will try to put it in the barn, or flush to slab pages using bulk free,
> when the barn is full. Then a new empty sheaf must be obtained to put
> more objects there.
>
> It's possible that no free sheaves are available to use for a new
> rcu_free sheaf, and the allocation in kfree_rcu() context can only use
> GFP_NOWAIT and thus may fail. In that case, fall back to the existing
> kfree_rcu() implementation.
>
> Expected advantages:
> - batching the kfree_rcu() operations, that could eventually replace the
>   existing batching
> - sheaves can be reused for allocations via barn instead of being
>   flushed to slabs, which is more efficient
>   - this includes cases where only some cpus are allowed to process rcu
> callbacks (Android)
>
> Possible disadvantage:
> - objects might be waiting for more than their grace period (it is
>   determined by the last object freed into the sheaf), increasing memory
>   usage - but the existing batching does that too.
>
> Only implement this for CONFIG_KVFREE_RCU_BATCHED as the tiny
> implementation favors smaller memory footprint over performance.
>
> Add CONFIG_SLUB_STATS counters free_rcu_sheaf and free_rcu_sheaf_fail to
> count how many kfree_rcu() used the rcu_free sheaf successfully and how
> many had to fall back to the existing implementation.
>
> Signed-off-by: Vlastimil Babka 
> ---
>  mm/slab.h|   3 +
>  mm/slab_common.c |  24 
>  mm/slub.c| 183 
> ++-
>  3 files changed, 208 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index 
> 1980330c2fcb4a4613a7e4f7efc78b349993fd89..ddf1e4bcba734dccbf67e83bdbab3ca7272f540e
>  100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -459,6 +459,9 @@ static inline bool is_kmalloc_normal(struct kmem_cache *s)
> return !(s->flags & 
> (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));
>  }
>
> +bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj);
> +
> +/* Legal flag mask for kmem_cache_create(), for various configurations */
>  #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
>  SLAB_CACHE_DMA32 | SLAB_PANIC | \
>  SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 
> 4f295bdd2d42355af6311a799955301005f8a532..6c3b90f03cb79b57f426824450f576a977d85c53
>  100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1608,6 +1608,27 @@ static void kfree_rcu_work(struct work_struct *work)
> kvfree_rcu_list(head);
>  }
>
> +static bool kfree_rcu_sheaf(void *obj)
> +{
> +   struct kmem_cache *s;
> +   struct folio *folio;
> +   struct slab *slab;
> +
> +   if (is_vmalloc_addr(obj))
> +   return false;
> +
> +   folio = virt_to_folio(obj);
> +   if (unlikely(!folio_test_slab(folio)))
> +   return false;
> +
> +   slab = folio_slab(folio);
> +   s = slab->slab_cache;
> +   if (s->cpu_sheaves)
> +   return __kfree_rcu_sheaf(s, obj);
> +
> +   return false;
> +}
> +
>  static bool
>  need_offload_krc(struct kfree_rcu_cpu *krcp)
>  {
> @@ -1952,6 +1973,9 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
> if (!head)
> might_sleep();
>
> +   if (kfree_rcu_sheaf(ptr))
> +   return;
> +
> // Queue the object but don't yet schedule the batch.
> if (debug_rcu_head_queue(ptr)) {
> // Probable double kfree_rcu(), just leak.
> diff --git a/mm/slub.c b/mm/slub.c
> index 
> ae3e80ad9926ca15601eef2f2aa016ca059498f8..6f31a27b5d47fa6621fa8af6d6842564077d4b60
>  100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -350,6 +350,8 @@ enum stat_item {
> ALLOC_FASTPATH, /* Allocation from cpu slab */
> ALLOC_SLOWPATH, /* Allocation by getting a new cpu slab */
> FREE_PCS,   /* Free to percpu sheaf */
> +   FREE_RCU_SHEAF, /* Free to rcu_free sheaf */
> +   FREE_RCU_SHEAF_FAIL,/* Failed to free to a rcu_free sheaf */
> FREE_FASTPATH,  /* Free to cpu slab */
> FREE_SLOWPATH,  /* Freeing not to cpu slab */
> FREE_FROZEN,/* Freeing to frozen slab */
> @@ -444,6 +446,7 @@ struct slab_sheaf {
> struct rcu_head rcu_head;
> struct list_head barn_list;
> };
> +   struct kmem_cache *cache;
> unsigned int size;
> void *objects[];
>  };
> @@ -452,6 +455,7 @@ struct slub_percpu_sheaves {
> local_trylock_t lock

Re: [PATCH v3 3/3] selftests: pidfd: add tests for PIDFD_SELF_*

2025-05-06 Thread John Hubbard

On 5/6/25 2:18 PM, Shuah Khan wrote:
> On 5/1/25 05:42, Peter Zijlstra wrote:
>> On Wed, Oct 16, 2024 at 07:14:34PM -0700, John Hubbard wrote:
>>> On 10/16/24 3:06 PM, Lorenzo Stoakes wrote:
 On Wed, Oct 16, 2024 at 02:00:27PM -0600, Shuah Khan wrote:
> On 10/16/24 04:20, Lorenzo Stoakes wrote:
>>> ...
>> Please fix this fucking selftests shit to just build. This is unusable
>> garbage.
> 
> I don't recall all the reasons why kselftests needed "make headers"
> One reason I could think of was that when a new test depends on a
> header change, the test won't build unless headers are installed.

...or until an updated copy of that updated header file is copied
somewhere, and then included in the kselftests. That's the approach
that I ultimately settled upon, after some discussion and negotion.

Details below.

> 
> If this requirement is causing problems for tests that don't fall
> into the category and we probably have more of them mow, we can
> clean that up.
> 
> John, you mentioned you got mm tests working without headers?
> Can you share the commit here.
> 

Yes. This one sets up the general approach, which is available to
all kselftests: TOOLS_INCLUDES. It also changes selftests/mm to
set TOOLS_INCLUDES in that build:

e076eaca5906 ("selftests: break the dependency upon local header files")

And here is a representative application of the above, to selftests/mm. In
other words, taking advantage of the new file location pointed to by
TOOLS_INCLUDES:

580ea358af0a ("selftests/mm: fix additional build errors for selftests")

thanks,
-- 
John Hubbard

[PATCH v2] selftests: pid_namespace: Fix missing mount headers in pid_max.c

2025-05-06 Thread Moon Hee Lee

Fix compilation errors in pid_max.c by including , which
provides definitions for mount(), umount2(), MS_PRIVATE, MS_REC, and
MNT_DETACH.

Without this header, the build fails with implicit declarations and
undefined constants during selftest compilation.

Changes since v1:
- Included example build errors below for clarity

Compile error:
pid_max.c: In function ‘pid_max_cb’:
pid_max.c:42:15: warning: implicit declaration of function ‘mount’ 
[-Wimplicit-function-declaration]
   42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
  |   ^
pid_max.c:42:36: error: ‘MS_PRIVATE’ undeclared; did you mean ‘MAP_PRIVATE’?
pid_max.c:42:49: error: ‘MS_REC’ undeclared
pid_max.c:48:9: warning: implicit declaration of function ‘umount2’
pid_max.c:48:26: error: ‘MNT_DETACH’ undeclared

[...output trimmed for brevity...]

make: *** [../lib.mk:222: 
/linux_mainline/tools/testing/selftests/pid_namespace/pid_max] Error 1

Signed-off-by: Moon Hee Lee 
---
 tools/testing/selftests/pid_namespace/pid_max.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/pid_namespace/pid_max.c 
b/tools/testing/selftests/pid_namespace/pid_max.c
index 51c414faabb0..972bedc475f1 100644
--- a/tools/testing/selftests/pid_namespace/pid_max.c
+++ b/tools/testing/selftests/pid_namespace/pid_max.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../kselftest_harness.h"
 #include "../pidfd/pidfd.h"
-- 
2.43.0

Re: [PATCH v4 3/9] slab: sheaf prefilling for guaranteed allocations

2025-05-06 Thread Suren Baghdasaryan

On Fri, Apr 25, 2025 at 1:28 AM Vlastimil Babka  wrote:
>
> Add functions for efficient guaranteed allocations e.g. in a critical
> section that cannot sleep, when the exact number of allocations is not
> known beforehand, but an upper limit can be calculated.
>
> kmem_cache_prefill_sheaf() returns a sheaf containing at least given
> number of objects.
>
> kmem_cache_alloc_from_sheaf() will allocate an object from the sheaf
> and is guaranteed not to fail until depleted.
>
> kmem_cache_return_sheaf() is for giving the sheaf back to the slab
> allocator after the critical section. This will also attempt to refill
> it to cache's sheaf capacity for better efficiency of sheaves handling,
> but it's not stricly necessary to succeed.
>
> kmem_cache_refill_sheaf() can be used to refill a previously obtained
> sheaf to requested size. If the current size is sufficient, it does
> nothing. If the requested size exceeds cache's sheaf_capacity and the
> sheaf's current capacity, the sheaf will be replaced with a new one,
> hence the indirect pointer parameter.
>
> kmem_cache_sheaf_size() can be used to query the current size.
>
> The implementation supports requesting sizes that exceed cache's
> sheaf_capacity, but it is not efficient - such "oversize" sheaves are
> allocated fresh in kmem_cache_prefill_sheaf() and flushed and freed
> immediately by kmem_cache_return_sheaf(). kmem_cache_refill_sheaf()
> might be especially ineffective when replacing a sheaf with a new one of
> a larger capacity. It is therefore better to size cache's
> sheaf_capacity accordingly to make oversize sheaves exceptional.
>
> CONFIG_SLUB_STATS counters are added for sheaf prefill and return
> operations. A prefill or return is considered _fast when it is able to
> grab or return a percpu spare sheaf (even if the sheaf needs a refill to
> satisfy the request, as those should amortize over time), and _slow
> otherwise (when the barn or even sheaf allocation/freeing has to be
> involved). sheaf_prefill_oversize is provided to determine how many
> prefills were oversize (counter for oversize returns is not necessary as
> all oversize refills result in oversize returns).
>
> When slub_debug is enabled for a cache with sheaves, no percpu sheaves
> exist for it, but the prefill functionality is still provided simply by
> all prefilled sheaves becoming oversize. If percpu sheaves are not
> created for a cache due to not passing the sheaf_capacity argument on
> cache creation, the prefills also work through oversize sheaves, but
> there's a WARN_ON_ONCE() to indicate the omission.
>
> Signed-off-by: Vlastimil Babka 
> Reviewed-by: Suren Baghdasaryan 
> ---
>  include/linux/slab.h |  16 
>  mm/slub.c| 265 
> +++
>  2 files changed, 281 insertions(+)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 
> 4cb495d55fc58c70a992ee4782d7990ce1c55dc6..b0a9ba33abae22bf38cbf1689e3c08bb0b05002f
>  100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -829,6 +829,22 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, 
> gfp_t flags,
>int node) __assume_slab_alignment __malloc;
>  #define kmem_cache_alloc_node(...) 
> alloc_hooks(kmem_cache_alloc_node_noprof(__VA_ARGS__))
>
> +struct slab_sheaf *
> +kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size);
> +
> +int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp,
> +   struct slab_sheaf **sheafp, unsigned int size);
> +
> +void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp,
> +  struct slab_sheaf *sheaf);
> +
> +void *kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *cachep, gfp_t 
> gfp,
> +   struct slab_sheaf *sheaf) __assume_slab_alignment 
> __malloc;
> +#define kmem_cache_alloc_from_sheaf(...)   \
> +   
> alloc_hooks(kmem_cache_alloc_from_sheaf_noprof(__VA_ARGS__))
> +
> +unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf);
> +
>  /*
>   * These macros allow declaring a kmem_buckets * parameter alongside size, 
> which
>   * can be compiled out with CONFIG_SLAB_BUCKETS=n so that a large number of 
> call
> diff --git a/mm/slub.c b/mm/slub.c
> index 
> 6f31a27b5d47fa6621fa8af6d6842564077d4b60..724266fdd996c091f1f0b34012c5179f17dfa422
>  100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -384,6 +384,11 @@ enum stat_item {
> BARN_GET_FAIL,  /* Failed to get full sheaf from barn */
> BARN_PUT,   /* Put full sheaf to barn */
> BARN_PUT_FAIL,  /* Failed to put full sheaf to barn */
> +   SHEAF_PREFILL_FAST, /* Sheaf prefill grabbed the spare sheaf */
> +   SHEAF_PREFILL_SLOW, /* Sheaf prefill found no spare sheaf */
> +   SHEAF_PREFILL_OVERSIZE, /* Allocation of oversize sheaf for prefill */
> +   SHEAF_RETURN_FAST,  /* Sheaf return reattached spare

Re: [PATCH] kbuild: use ARCH from compile.h in unclean source tree msg

2025-05-06 Thread Shuah Khan


On 5/6/25 16:07, Shuah Khan wrote:

On 5/6/25 05:12, Nicolas Schier wrote:

On Fri, 02 May 2025, Shuah Khan wrote:


When make finds the source tree unclean, it prints a message to run
"make ARCH=x86_64 mrproper" message using the ARCH from the command
line. The ARCH specified in the command line could be different from
the ARCH of the existing build in the source tree.

This could cause problems in regular kernel build and kunit workflows.

Regular workflow:

- Build x86_64 kernel
$ make ARCH=x86_64
- Try building another arch kernel out of tree with O=
$ make ARCH=um O=/linux/build
- kbuild detects source tree is unclean

   ***
   *** The source tree is not clean, please run 'make ARCH=um mrproper'
   *** in /linux/linux_srcdir
   ***

- Clean source tree as suggested by kbuild
$ make ARCH=um mrproper
- Source clean appears to be clean, but it leaves behind generated header
   files under arch/x86
  arch/x86/realmode/rm/pasyms.h

A subsequent x86_64e build fails with
   "undefined symbol sev_es_trampoline_start referenced ..."

kunit workflow runs into this issue:

- Build x86_64 kernel
- Run kunit tests:  it tries to build for user specified ARCH or uml
   as default:
$ ./tools/testing/kunit/kunit.py run

- kbuild detects unclean source tree

   ***
   *** The source tree is not clean, please run 'make ARCH=um mrproper'
   *** in /linux/linux_6.15
   ***

- Clean source tree as suggested by kbuild
$ make ARCH=um mrproper
- Source clean appears to be clean, but it leaves behind generated header
   files under arch/x86

The problem shows when user tries to run tests on ARCH=x86_64:

$ ./tools/testing/kunit/kunit.py run ARCH=x86_64

"undefined symbol sev_es_trampoline_start referenced ..."

Build trips on arch/x86/realmode/rm/pasyms.h left behind by a prior
x86_64 build.

Problems related to partially cleaned source tree are hard to debug.
Change Makefile to unclean source logic to use ARCH from compile.h
UTS_MACHINE string. With this change kbuild prints:

$ ./tools/testing/kunit/kunit.py run

   ***
   *** The source tree is not clean, please run 'make ARCH=x86_64 mrproper'
   *** in /linux/linux_6.15
   ***

Signed-off-by: Shuah Khan 
---
  Makefile | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 5aa9ee52a765..7ee29136b4da 100644
--- a/Makefile
+++ b/Makefile
@@ -674,7 +674,7 @@ ifeq ($(KBUILD_EXTMOD),)
   -d $(srctree)/include/config -o \
   -d $(srctree)/arch/$(SRCARCH)/include/generated ]; then \


Would it make sense to check for include/generated as a catch all?


Adding check is good, but it won't cover the compile.h missing. I don't
know if compile.h could go missing if include/generated exists. In any
case, it is good to check for compile exists or not and print appropriate
message for these cases.

I have the change working. Will send it out. Thanks for the tip on
the (rare) cases.

thanks,
-- Shuah

Re: [PATCH 0/4] Add *_wait_val values for GDSCs in all SM6350 clock drivers

2025-05-06 Thread Bjorn Andersson



On Fri, 25 Apr 2025 14:12:54 +0200, Luca Weiss wrote:
> As described in the commit messages, keep the GDSC configs aligned with
> the downstream kernel.
> 
> For reference, this was checked using the following code:
> 
> To: Bjorn Andersson 
> To: Michael Turquette 
> To: Stephen Boyd 
> To: Konrad Dybcio 
> To: AngeloGioacchino Del Regno 
> Cc: ~postmarketos/upstream...@lists.sr.ht
> Cc: phone-de...@vger.kernel.org
> Cc: linux-arm-...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> 
> [...]

Applied, thanks!

[0/4] Add *_wait_val values for GDSCs in all SM6350 clock drivers
  (no commit info)
[1/4] clk: qcom: camcc-sm6350: Add *_wait_val values for GDSCs
  commit: e7b1c13280ad866f3b935f6c658713c41db61635
[2/4] clk: qcom: dispcc-sm6350: Add *_wait_val values for GDSCs
  commit: 673989d27123618afab56df1143a75454178b4ae
[3/4] clk: qcom: gcc-sm6350: Add *_wait_val values for GDSCs
  commit: afdfd829a99e467869e3ca1955fb6c6e337c340a
[4/4] clk: qcom: gpucc-sm6350: Add *_wait_val values for GDSCs
  commit: d988b0b866c2aeb23aa74022b5bbd463165a7a33

Best regards,
-- 
Bjorn Andersson

Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-05-06 Thread Nataliia Bondarevska

Tested-by: Nataliia Bondarevska 




On Fri, May 2, 2025 at 1:56 AM Jarkko Sakkinen  wrote:
>
> On Fri, 2025-05-02 at 07:22 +, Reshetova, Elena wrote:
> >
> > >
> > > On Wed, Apr 30, 2025 at 06:53:32AM +, Reshetova, Elena wrote:
> > > > 2. Switch to Sean's approach to execute EUPDATESVN during the
> > > sgx_open().
> > > > Btw, Sean do you agree that we don't gain much doing it second
> > > > time during
> > > > release() given the microcode flow?
> > > > I would rather leave only one invocation of eupdatesvn during
> > > sgx_inc_usage_count().
> > > >
> > > > Proc: No new uABI. More predictable on svn change compared to
> > > > option 1.
> > >
> > > > Cons: Two explicit paths to hook: sgx_open() and sgx_vepc_open().
> > >
> > > Why this is a con?
> >
> > Well, just from the pov of not having a single path to enable.
> > Are you ok with option 2?
> >
>
> Yep, as SGX is anyway very much run-time managed feature and these
> hooks fit better on how it is used.
>
> > Best Regards,
> > Elena.
>
> BR, Jarkko
>

Re: [PATCH v4 8/9] mm, vma: use percpu sheaves for vm_area_struct cache

2025-05-06 Thread Suren Baghdasaryan

On Fri, Apr 25, 2025 at 1:28 AM Vlastimil Babka  wrote:
>
> Create the vm_area_struct cache with percpu sheaves of size 32 to
> improve its performance.
>
> Signed-off-by: Vlastimil Babka 

I think Lorenzo's refactoring moved this code out of fork.c, so it
will have to be adjusted.

Reviewed-by: Suren Baghdasaryan 

> ---
>  kernel/fork.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 
> c4b26cd8998b8e7b2b516e0bb0b1d4676ff644dc..3bd711f0798c88aee04bc30ff21fc4ca2b66201a
>  100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -3216,6 +3216,7 @@ void __init proc_caches_init(void)
> struct kmem_cache_args args = {
> .use_freeptr_offset = true,
> .freeptr_offset = offsetof(struct vm_area_struct, vm_freeptr),
> +   .sheaf_capacity = 32,
> };
>
> sighand_cachep = kmem_cache_create("sighand_cache",
>
> --
> 2.49.0
>

Re: [PATCH v2 0/4] of: Common "memory-region" parsing

2025-05-06 Thread Peng Fan

On Wed, Apr 23, 2025 at 02:42:12PM -0500, Rob Herring (Arm) wrote:
>While there's a common function to parse "memory-region" properties for
>DMA pool regions, there's not anything for driver private regions. As a
>result, drivers have resorted to parsing "memory-region" properties
>themselves repeating the same pattern over and over. To fix this, this
>series adds 2 functions to handle those cases:
>of_reserved_mem_region_to_resource() and of_reserved_mem_region_count().
>
>I've converted the whole tree, but just including remoteproc here as
>it has the most cases. I intend to apply the first 3 patches for 6.16
>so the driver conversions can be applied for 6.17.
>
>A git tree with all the drivers converted is here[1].
>
>v2:
>- Fix of_dma_set_restricted_buffer() to maintain behavior on warning msg
>- Export devm_ioremap_resource_wc()
>- Rework handling of resource name to drop unit-address from name as it 
>  was before.
>- Link to v1: 
>  https://lore.kernel.org/all/20250317232426.952188-1-r...@kernel.org
>
>[1] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git 
>dt/memory-region
>
>Signed-off-by: Rob Herring (Arm) 
>---
>Rob Herring (Arm) (4):
>  of: reserved_mem: Add functions to parse "memory-region"
>  of: Simplify of_dma_set_restricted_buffer() to use of_for_each_phandle()
>  devres: Export devm_ioremap_resource_wc()
>  remoteproc: Use of_reserved_mem_region_* functions for "memory-region"
>
> drivers/of/device.c   | 31 +---
> drivers/of/of_reserved_mem.c  | 80 +++
> drivers/remoteproc/imx_dsp_rproc.c| 45 +++--
> drivers/remoteproc/imx_rproc.c| 68 +++---

Tested-by: Peng Fan  (i.MX93-11x11-EVK for imx_rproc.c)

Thanks,
Peng

Re: [PATCH v4 00/38] Mediated vPMU 4.0 for x86

2025-05-06 Thread Mi, Dapeng



On 5/7/2025 3:45 AM, Sean Christopherson wrote:
> On Tue, May 06, 2025, Dapeng Mi wrote:
>> Hi Sean,
>>
>> Not sure if you have bandwidth to review this mediated vPMU v4 patchset?
> I'm getting there.  I wanted to get through all the stuff I thought would 
> likely
> be ready for 6.16 as-is before moving onto the larger series.

Got it. Thanks.


>

Re: [RFC PATCH 39/39] KVM: guest_memfd: Dynamically split/reconstruct HugeTLB page

2025-05-06 Thread Yan Zhao

On Tue, May 06, 2025 at 12:22:47PM -0700, Ackerley Tng wrote:
> Yan Zhao  writes:
> 
> >> > 
> >> >
> >> > What options does userspace have in this scenario?
> >> > It can't reduce the flag to KVM_GUEST_MEMFD_HUGE_2MB. Adjusting the 
> >> > gmem.pgoff
> >> > isn't ideal either.
> >> >
> >> > What about something similar as below?
> >> >
> >> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> >> > index d2feacd14786..87c33704a748 100644
> >> > --- a/virt/kvm/guest_memfd.c
> >> > +++ b/virt/kvm/guest_memfd.c
> >> > @@ -1842,8 +1842,16 @@ __kvm_gmem_get_pfn(struct file *file, struct 
> >> > kvm_memory_slot *slot,
> >> > }
> >> >
> >> > *pfn = folio_file_pfn(folio, index);
> >> > -   if (max_order)
> >> > -   *max_order = folio_order(folio);
> >> > +   if (max_order) {
> >> > +   int order;
> >> > +
> >> > +   order = folio_order(folio);
> >> > +
> >> > +   while (order > 0 && ((slot->base_gfn ^ slot->gmem.pgoff) 
> >> > & ((1 << order) - 1)))
> >> > +   order--;
> >> > +
> >> > +   *max_order = order;
> >> > +   }
> >> >
> >> > *is_prepared = folio_test_uptodate(folio);
> >> > return folio;
> >> >
> >> 
> >> Vishal was wondering how this is working before guest_memfd was
> >> introduced, for other backing memory like HugeTLB.
> >> 
> >> I then poked around and found this [1]. I will be adding a similar check
> >> for any slot where kvm_slot_can_be_private(slot).
> >>
> >> Yan, that should work, right?
> > No, I don't think the checking of ugfn [1] should work.
> >
> > 1. Even for slots bound to in-place-conversion guest_memfd (i.e. shared 
> > memory
> > are allocated from guest_memfd), the slot->userspace_addr does not 
> > necessarily
> > have the same offset as slot->gmem.pgoff. Even if we audit the offset in
> > kvm_gmem_bind(), userspace could invoke munmap() and mmap() afterwards, 
> > causing
> > slot->userspace_addr to point to a different offset.
> >
> > 2. for slots bound to guest_memfd that do not support in-place-conversion,
> > shared memory is allocated from a different backend. Therefore, checking
> > "slot->base_gfn ^ slot->gmem.pgoff" is required for private memory. The 
> > check is
> > currently absent because guest_memfd supports 4K only.
> >
> >
> 
> Let me clarify, I meant these changes:
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4b64ab3..d0dccf1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12938,6 +12938,11 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, 
> unsigned long npages)
> return 0;
>  }
>  
> +static inline bool kvm_is_level_aligned(u64 value, int level)
> +{
> +   return IS_ALIGNED(value, KVM_PAGES_PER_HPAGE(level));
> +}
> +
>  static int kvm_alloc_memslot_metadata(struct kvm *kvm,
>   struct kvm_memory_slot *slot)
>  {
> @@ -12971,16 +12976,20 @@ static int kvm_alloc_memslot_metadata(struct kvm 
> *kvm,
>  
> slot->arch.lpage_info[i - 1] = linfo;
>  
> -   if (slot->base_gfn & (KVM_PAGES_PER_HPAGE(level) - 1))
> +   if (!kvm_is_level_aligned(slot->base_gfn, level))
> linfo[0].disallow_lpage = 1;
> -   if ((slot->base_gfn + npages) & (KVM_PAGES_PER_HPAGE(level) - 
> 1))
> +   if (!kvm_is_level_aligned(slot->base_gfn + npages, level))
> linfo[lpages - 1].disallow_lpage = 1;
> ugfn = slot->userspace_addr >> PAGE_SHIFT;
> /*
> -* If the gfn and userspace address are not aligned wrt each
> -* other, disable large page support for this slot.
> +* If the gfn and userspace address are not aligned or if gfn
> +* and guest_memfd offset are not aligned wrt each other,
> +* disable large page support for this slot.
>  */
> -   if ((slot->base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 
> 1)) {
> +   if (!kvm_is_level_aligned(slot->base_gfn ^ ugfn, level) ||
> +   (kvm_slot_can_be_private(slot) &&
> +!kvm_is_level_aligned(slot->base_gfn ^ slot->gmem.pgoff,
> +  level))) {
> unsigned long j;
>  
> for (j = 0; j < lpages; ++j)
> 
> This does not rely on the ugfn check, but adds a similar check for gmem.pgoff.
In the case of shared memory is not allocated from guest_memfd, (e.g. with the
current upstream code), the checking of gmem.pgoff here will disallow huge page
of shared memory even if "slot->base_gfn ^ ugfn" is aligned.

> I think this should take care of case (1.), for guest_memfds going to be
> used for both shared and private memory. Userspace can't update
> slot->userspace_addr, since guest_memfd memslots cannot be updated and
> can only be deleted.
> 
> If userspa

Re: [PATCH 3/6] arm64: dts: qcom: qcs615: Add mproc node for SEMP2P

2025-05-06 Thread Lijuan Gao





在 4/23/2025 5:29 PM, Konrad Dybcio 写道:

On 4/23/25 11:17 AM, Lijuan Gao wrote:

From: Kyle Deng 

The Shared Memory Point to Point (SMP2P) protocol facilitates
communication of a single 32-bit value between two processors.
Add these two nodes for remoteproc enablement on QCS615 SoC.

Signed-off-by: Kyle Deng 
Signed-off-by: Lijuan Gao 
---
  arch/arm64/boot/dts/qcom/qcs615.dtsi | 79 
  1 file changed, 79 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/qcs615.dtsi 
b/arch/arm64/boot/dts/qcom/qcs615.dtsi
index edfb796d8dd3..ab3c6ba5842b 100644
--- a/arch/arm64/boot/dts/qcom/qcs615.dtsi
+++ b/arch/arm64/boot/dts/qcom/qcs615.dtsi
@@ -332,6 +332,80 @@ mc_virt: interconnect-2 {
qcom,bcm-voters = <&apps_bcm_voter>;
};
  
+	qcom,smp2p-adsp {


Remove the qcom prefix


Understood, it will be updated in the next patch.



+   compatible = "qcom,smp2p";
+   qcom,smem = <443>, <429>;
+   interrupts = ;
+   mboxes = <&apss_shared 26>;
+   qcom,ipc = <&apcs 0 26>;
+   qcom,local-pid = <0>;
+   qcom,remote-pid = <2>;
+
+   adsp_smp2p_out: master-kernel {
+   qcom,entry-name = "master-kernel";
+   #qcom,smem-state-cells = <1>;
+   };
+
+   adsp_smp2p_in: slave-kernel {
+   qcom,entry-name = "slave-kernel";
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   };
+
+   sleepstate_smp2p_out: sleepstate-out {
+   qcom,entry-name = "sleepstate";
+   #qcom,smem-state-cells = <1>;
+   };
+
+   sleepstate_smp2p_in: qcom,sleepstate-in {
+   qcom,entry-name = "sleepstate_see";
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   };
+   smp2p_rdbg2_out: qcom,smp2p-rdbg2-out {
+   qcom,entry-name = "rdbg";
+   #qcom,smem-state-cells = <1>;
+   };
+
+   smp2p_rdbg2_in: qcom,smp2p-rdbg2-in {
+   qcom,entry-name = "rdbg";
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   };
+   };
+
+   qcom,smp2p-cdsp {
+   compatible = "qcom,smp2p";
+   qcom,smem = <94>, <432>;
+   interrupts = ;
+   mboxes = <&apss_shared 6>;
+   qcom,ipc = <&apcs 0 6>;
+   qcom,local-pid = <0>;
+   qcom,remote-pid = <5>;
+
+   cdsp_smp2p_out: master-kernel {
+   qcom,entry-name = "master-kernel";
+   #qcom,smem-state-cells = <1>;
+   };
+
+   cdsp_smp2p_in: slave-kernel {
+   qcom,entry-name = "slave-kernel";
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   };
+
+   smp2p_rdbg5_out: qcom,smp2p-rdbg5-out {
+   qcom,entry-name = "rdbg";
+   #qcom,smem-state-cells = <1>;
+   };
+
+   smp2p_rdbg5_in: qcom,smp2p-rdbg5-in {
+   qcom,entry-name = "rdbg";
+   interrupt-controller;
+   #interrupt-cells = <2>;
+   };
+   };
+
qup_opp_table: opp-table-qup {
compatible = "operating-points-v2";
opp-shared;
@@ -3337,6 +3411,11 @@ apss_shared: mailbox@17c0 {
#mbox-cells = <1>;
};
  
+		apcs: syscon@17cc {

+   compatible = "syscon";


There is already a description for this block above what you added

qcom,ipc under smp2p is mutually exclusive with `mboxes`, so adding
the above isn't necessary at all

Konrad


Understood, I will remove the qcom,ipc in next patch.

--
Thx and BRs
Lijuan Gao

Re: [PATCH bpf-next v3 3/3] libbpf: Use mmap to parse vmlinux BTF from sysfs

2025-05-06 Thread Andrii Nakryiko

On Mon, May 5, 2025 at 11:39 AM Lorenz Bauer  wrote:
>
> Teach libbpf to use mmap when parsing vmlinux BTF from /sys. We don't
> apply this to fall-back paths on the regular file system because there
> is no way to ensure that modifications underlying the MAP_PRIVATE
> mapping are not visible to the process.
>
> Signed-off-by: Lorenz Bauer 
> ---
>  tools/lib/bpf/btf.c | 83 
> ++---
>  1 file changed, 72 insertions(+), 11 deletions(-)
>

[...]

> @@ -1030,7 +1045,7 @@ struct btf *btf__new_empty_split(struct btf *base_btf)
> return libbpf_ptr(btf_new_empty(base_btf));
>  }
>
> -static struct btf *btf_new(const void *data, __u32 size, struct btf 
> *base_btf)
> +static struct btf *btf_new_no_copy(void *data, __u32 size, struct btf 
> *base_btf)
>  {
> struct btf *btf;
> int err;
> @@ -1050,12 +1065,7 @@ static struct btf *btf_new(const void *data, __u32 
> size, struct btf *base_btf)
> btf->start_str_off = base_btf->hdr->str_len;
> }
>
> -   btf->raw_data = malloc(size);
> -   if (!btf->raw_data) {
> -   err = -ENOMEM;
> -   goto done;
> -   }
> -   memcpy(btf->raw_data, data, size);
> +   btf->raw_data = data;
> btf->raw_size = size;
>
> btf->hdr = btf->raw_data;
> @@ -1081,6 +1091,24 @@ static struct btf *btf_new(const void *data, __u32 
> size, struct btf *base_btf)
> return btf;
>  }
>
> +static struct btf *btf_new(const void *data, __u32 size, struct btf 
> *base_btf)

btf_new() is internal, so I'd extend existing btf_new() with `bool
is_mmap` and not add btf_new_no_copy(), I think it's simpler.
Eventually we can turn is_mmap into some sort of flags, if we need
more tuning of data ownership behavior

> +{
> +   struct btf *btf;
> +   void *raw_data;
> +
> +   raw_data = malloc(size);
> +   if (!raw_data)
> +   return ERR_PTR(-ENOMEM);
> +
> +   memcpy(raw_data, data, size);
> +
> +   btf = btf_new_no_copy(raw_data, size, base_btf);
> +   if (IS_ERR(btf))
> +   free(raw_data);
> +
> +   return btf;
> +}
> +
>  struct btf *btf__new(const void *data, __u32 size)
>  {
> return libbpf_ptr(btf_new(data, size, NULL));
> @@ -1354,6 +1382,37 @@ struct btf *btf__parse_raw_split(const char *path, 
> struct btf *base_btf)
> return libbpf_ptr(btf_parse_raw(path, base_btf));
>  }
>
> +static struct btf *btf_parse_raw_mmap(const char *path, struct btf *base_btf)
> +{
> +   struct stat st;
> +   void *data;
> +   struct btf *btf;
> +   int fd;
> +
> +   fd = open(path, O_RDONLY);
> +   if (fd < 0)
> +   return libbpf_err_ptr(-errno);
> +
> +   if (fstat(fd, &st) < 0) {
> +   close(fd);

close() can clobber errno, so save `err = -errno` before it

> +   return libbpf_err_ptr(-errno);
> +   }
> +
> +   data = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +   close(fd);

same, errno clobbering danger

> +
> +   if (data == MAP_FAILED)
> +   return NULL;
> +
> +   btf = btf_new_no_copy(data, st.st_size, base_btf);
> +   if (!btf)

btf_new_no_copy() is returning ERR_PTR() on error, no?

pw-bot: cr

> +   munmap(data, st.st_size);
> +   else
> +   btf->raw_data_is_mmap = true;
> +
> +   return btf;
> +}
> +
>  static struct btf *btf_parse(const char *path, struct btf *base_btf, struct 
> btf_ext **btf_ext)
>  {
> struct btf *btf;
> @@ -1659,8 +1718,7 @@ struct btf *btf__load_from_kernel_by_id(__u32 id)
>  static void btf_invalidate_raw_data(struct btf *btf)
>  {
> if (btf->raw_data) {
> -   free(btf->raw_data);
> -   btf->raw_data = NULL;
> +   btf_free_raw_data(btf);
> }
> if (btf->raw_data_swapped) {
> free(btf->raw_data_swapped);
> @@ -5290,7 +5348,10 @@ struct btf *btf__load_vmlinux_btf(void)
> pr_warn("kernel BTF is missing at '%s', was 
> CONFIG_DEBUG_INFO_BTF enabled?\n",
> sysfs_btf_path);
> } else {
> -   btf = btf__parse(sysfs_btf_path, NULL);
> +   btf = btf_parse_raw_mmap(sysfs_btf_path, NULL);
> +   if (IS_ERR_OR_NULL(btf))
> +   btf = btf__parse(sysfs_btf_path, NULL);
> +
> if (!btf) {
> err = -errno;
> pr_warn("failed to read kernel BTF from '%s': %s\n",
>
> --
> 2.49.0
>

Re: [PATCH bpf-next v3 1/3] btf: allow mmap of vmlinux btf

2025-05-06 Thread Andrii Nakryiko

On Mon, May 5, 2025 at 11:39 AM Lorenz Bauer  wrote:
>
> User space needs access to kernel BTF for many modern features of BPF.
> Right now each process needs to read the BTF blob either in pieces or
> as a whole. Allow mmaping the sysfs file so that processes can directly
> access the memory allocated for it in the kernel.
>
> Signed-off-by: Lorenz Bauer 
> ---
>  include/asm-generic/vmlinux.lds.h |  3 ++-
>  kernel/bpf/sysfs_btf.c| 37 +
>  2 files changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 
> 58a635a6d5bdf0c53c267c2a3d21a5ed8678ce73..1750390735fac7637cc4d2fa05f96cb2a36aa448
>  100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -667,10 +667,11 @@ defined(CONFIG_AUTOFDO_CLANG) || 
> defined(CONFIG_PROPELLER_CLANG)
>   */
>  #ifdef CONFIG_DEBUG_INFO_BTF
>  #define BTF\
> +   . = ALIGN(PAGE_SIZE);   \
> .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) {   \
> BOUNDED_SECTION_BY(.BTF, _BTF)  \
> }   \
> -   . = ALIGN(4);   \
> +   . = ALIGN(PAGE_SIZE);   \
> .BTF_ids : AT(ADDR(.BTF_ids) - LOAD_OFFSET) {   \
> *(.BTF_ids) \
> }
> diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c
> index 
> 81d6cf90584a7157929c50f62a5c6862e7a3d081..37278d7f38ae72f2d7efcfa859e86aaf12e39a25
>  100644
> --- a/kernel/bpf/sysfs_btf.c
> +++ b/kernel/bpf/sysfs_btf.c
> @@ -7,14 +7,51 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>
>  /* See scripts/link-vmlinux.sh, gen_btf() func for details */
>  extern char __start_BTF[];
>  extern char __stop_BTF[];
>
> +static int btf_sysfs_vmlinux_mmap(struct file *filp, struct kobject *kobj,
> + const struct bin_attribute *attr,
> + struct vm_area_struct *vma)
> +{
> +   unsigned long pages = PAGE_ALIGN(attr->size) >> PAGE_SHIFT;
> +   size_t vm_size = vma->vm_end - vma->vm_start;
> +   unsigned long addr = (unsigned long)attr->private;
> +   int i, err = 0;
> +
> +   if (addr != (unsigned long)__start_BTF || !PAGE_ALIGNED(addr))
> +   return -EINVAL;
> +
> +   if (vma->vm_pgoff)
> +   return -EINVAL;

any particular reason to not allow vm_pgoff?

> +
> +   if (vma->vm_flags & (VM_WRITE | VM_EXEC | VM_MAYSHARE))
> +   return -EACCES;
> +
> +   if (vm_size >> PAGE_SHIFT > pages)

() around shift operation, please, for those of us who haven't
memorized the entire C operator precedence table ;)

> +   return -EINVAL;
> +
> +   vm_flags_mod(vma, VM_DONTDUMP, VM_MAYEXEC | VM_MAYWRITE);
> +
> +   for (i = 0; i < pages && !err; i++, addr += PAGE_SIZE)
> +   err = vm_insert_page(vma, vma->vm_start + i * PAGE_SIZE,
> +virt_to_page(addr));
> +
> +   if (err)
> +   zap_vma_pages(vma);

it's certainly subjective, but I find this error handling with !err in
for loop condition hard to follow. What's wrong with arguably more
straightforward (and as you can see I'm not a big fan of mutated addr
but calculated vma->vm_start + i * PAGE_SIZE: pick one style one
follow it for both entities?):


for (i = 0; i < pages; i++) {
err = vm_insert_page(vma, vma->vm_start + i * PAGE_SIZE,
 virt_to_page(addr + i * PAGE_SIZE));
if (err) {
zap_vma_pages(vma);
return err;
}
}

return 0;

?


> +
> +   return err;
> +}
> +
>  static struct bin_attribute bin_attr_btf_vmlinux __ro_after_init = {
> .attr = { .name = "vmlinux", .mode = 0444, },
> .read_new = sysfs_bin_attr_simple_read,
> +   .mmap = btf_sysfs_vmlinux_mmap,
>  };
>
>  struct kobject *btf_kobj;
>
> --
> 2.49.0
>

Re: [PATCH bpf-next v3 2/3] selftests: bpf: add a test for mmapable vmlinux BTF

2025-05-06 Thread Andrii Nakryiko

On Mon, May 5, 2025 at 11:39 AM Lorenz Bauer  wrote:
>
> Add a basic test for the ability to mmap /sys/kernel/btf/vmlinux. Since
> libbpf doesn't have an API to parse BTF from memory we do some basic
> sanity checks ourselves.
>
> Signed-off-by: Lorenz Bauer 
> ---
>  tools/testing/selftests/bpf/prog_tests/btf_sysfs.c | 83 
> ++
>  1 file changed, 83 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/btf_sysfs.c 
> b/tools/testing/selftests/bpf/prog_tests/btf_sysfs.c
> new file mode 100644
> index 
> ..3319cf758897d46cefa8ca25e16acb162f4e9889
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/btf_sysfs.c
> @@ -0,0 +1,83 @@
> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> +/* Copyright (c) 2025 Isovalent */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static void test_btf_mmap_sysfs(const char *path, struct btf *base)
> +{
> +   struct stat st;
> +   __u64 btf_size, end;
> +   void *raw_data = NULL;
> +   int fd = -1;
> +   long page_size;
> +   struct btf *btf = NULL;
> +
> +   page_size = sysconf(_SC_PAGESIZE);
> +   if (!ASSERT_GE(page_size, 0, "get_page_size"))
> +   goto cleanup;
> +
> +   if (!ASSERT_OK(stat(path, &st), "stat_btf"))
> +   goto cleanup;
> +
> +   btf_size = st.st_size;
> +   end = (btf_size + page_size - 1) / page_size * page_size;
> +
> +   fd = open(path, O_RDONLY);
> +   if (!ASSERT_GE(fd, 0, "open_btf"))
> +   goto cleanup;
> +
> +   raw_data = mmap(NULL, btf_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, 
> fd, 0);
> +   if (!ASSERT_EQ(raw_data, MAP_FAILED, "mmap_btf_writable"))
> +   goto cleanup;
> +
> +   raw_data = mmap(NULL, btf_size, PROT_READ, MAP_SHARED, fd, 0);
> +   if (!ASSERT_EQ(raw_data, MAP_FAILED, "mmap_btf_shared"))
> +   goto cleanup;
> +
> +   raw_data = mmap(NULL, end + 1, PROT_READ, MAP_PRIVATE, fd, 0);
> +   if (!ASSERT_EQ(raw_data, MAP_FAILED, "mmap_btf_invalid_size"))
> +   goto cleanup;
> +
> +   raw_data = mmap(NULL, end, PROT_READ, MAP_PRIVATE, fd, 0);
> +   if (!ASSERT_NEQ(raw_data, MAP_FAILED, "mmap_btf"))

ASSERT_OK_PTR()?

> +   goto cleanup;
> +
> +   if (!ASSERT_EQ(mprotect(raw_data, btf_size, PROT_READ | PROT_WRITE), 
> -1,
> +   "mprotect_writable"))
> +   goto cleanup;
> +
> +   if (!ASSERT_EQ(mprotect(raw_data, btf_size, PROT_READ | PROT_EXEC), 
> -1,
> +   "mprotect_executable"))
> +   goto cleanup;
> +
> +   /* Check padding is zeroed */
> +   for (int i = btf_size; i < end; i++) {
> +   if (((__u8 *)raw_data)[i] != 0) {
> +   PRINT_FAIL("tail of BTF is not zero at page offset 
> %d\n", i);
> +   goto cleanup;
> +   }
> +   }
> +
> +   btf = btf__new_split(raw_data, btf_size, base);
> +   if (!ASSERT_NEQ(btf, NULL, "parse_btf"))

ASSERT_OK_PTR()

> +   goto cleanup;
> +
> +cleanup:
> +   if (raw_data && raw_data != MAP_FAILED)
> +   munmap(raw_data, btf_size);
> +   if (btf)

no need to check this, all libbpf destructor APIs deal with NULL
correctly (ignoring them)

> +   btf__free(btf);
> +   if (fd >= 0)
> +   close(fd);
> +}
> +
> +void test_btf_sysfs(void)
> +{
> +   if (test__start_subtest("vmlinux"))
> +   test_btf_mmap_sysfs("/sys/kernel/btf/vmlinux", NULL);

Do you intend to add more subtests? if not, why even using a subtest structure

> +}

>
> --
> 2.49.0
>

Re: [PATCH] clk: test: Forward-declare struct of_phandle_args in kunit/clk.h

2025-05-06 Thread Stephen Boyd

Quoting Richard Fitzgerald (2025-03-27 05:52:14)
> Add a forward-declare of struct of_phandle_args to prevent the compiler
> warning:
> 
> ../include/kunit/clk.h:29:63: warning: ‘struct of_phandle_args’ declared
> inside parameter list will not be visible outside of this definition or
> declaration
>struct clk_hw *(*get)(struct of_phandle_args *clkspec, void *data),
> 
> Signed-off-by: Richard Fitzgerald 
> ---

Applied to clk-next

Re: [PATCH net-next v4 3/3] vsock/test: Expand linger test to ensure close() does not misbehave

2025-05-06 Thread Michal Luczaj

On 5/6/25 11:46, Stefano Garzarella wrote:
> On Tue, 6 May 2025 at 11:43, Stefano Garzarella  wrote:
>>
>> On Thu, May 01, 2025 at 10:05:24AM +0200, Michal Luczaj wrote:
>>> There was an issue with SO_LINGER: instead of blocking until all queued
>>> messages for the socket have been successfully sent (or the linger timeout
>>> has been reached), close() would block until packets were handled by the
>>> peer.
>>
>> This is a new behaviour that only new kernels will follow, so I think
>> it is better to add a new test instead of extending a pre-existing test
>> that we described as "SOCK_STREAM SO_LINGER null-ptr-deref".
>>
>> The old test should continue to check the null-ptr-deref also for old
>> kernels, while the new test will check the new behaviour, so we can skip
>> the new test while testing an old kernel.

Right, I'll split it.

> I also saw that we don't have any test to verify that actually the
> lingering is working, should we add it since we are touching it?

Yeah, I agree we should. Do you have any suggestion how this could be done
reliably?

Thanks,
Michal

Re: [PATCH net-next v4 2/3] vsock: Move lingering logic to af_vsock core

2025-05-06 Thread Michal Luczaj

On 5/6/25 11:53, Stefano Garzarella wrote:
> On Thu, May 01, 2025 at 10:05:23AM +0200, Michal Luczaj wrote:
>> Lingering should be transport-independent in the long run. In preparation
>> for supporting other transports, as well the linger on shutdown(), move
>> code to core.
>>
>> Generalize by querying vsock_transport::unsent_bytes(), guard against the
>> callback being unimplemented. Do not pass sk_lingertime explicitly. Pull
>> SOCK_LINGER check into vsock_linger().
>>
>> Flatten the function. Remove the nested block by inverting the condition:
>> return early on !timeout.
>>
>> Suggested-by: Stefano Garzarella 
>> Signed-off-by: Michal Luczaj 
>> ---
>> include/net/af_vsock.h  |  1 +
>> net/vmw_vsock/af_vsock.c| 30 ++
>> net/vmw_vsock/virtio_transport_common.c | 23 ++-
>> 3 files changed, 33 insertions(+), 21 deletions(-)
>>
>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> index 
>> 9e85424c834353d016a527070dd62e15ff3bfce1..d56e6e135158939087d060dfcf65d3fdaea53bf3
>>  100644
>> --- a/include/net/af_vsock.h
>> +++ b/include/net/af_vsock.h
>> @@ -221,6 +221,7 @@ void vsock_for_each_connected_socket(struct 
>> vsock_transport *transport,
>>   void (*fn)(struct sock *sk));
>> int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock 
>> *psk);
>> bool vsock_find_cid(unsigned int cid);
>> +void vsock_linger(struct sock *sk);
>>
>> / TAP /
>>
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index 
>> fc6afbc8d6806a4d98c66abc3af4bd139c583b08..a31ad6b141cd38d1806df4b5d417924bb8607e32
>>  100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -1013,6 +1013,36 @@ static int vsock_getname(struct socket *sock,
>>  return err;
>> }
>>
>> +void vsock_linger(struct sock *sk)
>> +{
>> +DEFINE_WAIT_FUNC(wait, woken_wake_function);
>> +ssize_t (*unsent)(struct vsock_sock *vsk);
>> +struct vsock_sock *vsk = vsock_sk(sk);
>> +long timeout;
>> +
>> +if (!sock_flag(sk, SOCK_LINGER))
>> +return;
>> +
>> +timeout = sk->sk_lingertime;
>> +if (!timeout)
>> +return;
>> +
>> +/* unsent_bytes() may be unimplemented. */
> 
> This comment IMO should be enriched, as it is now it doesn't add much to 
> the code. I'm thinking on something like this:
>  Transports must implement `unsent_bytes` if they want to support
>  SOCK_LINGER through `vsock_linger()` since we use it to check when
>  the socket can be closed.

OK, will do.

Thanks,
Michal

Re: [PATCH] kbuild: use ARCH from compile.h in unclean source tree msg

2025-05-06 Thread Shuah Khan


On 5/6/25 05:12, Nicolas Schier wrote:

On Fri, 02 May 2025, Shuah Khan wrote:


When make finds the source tree unclean, it prints a message to run
"make ARCH=x86_64 mrproper" message using the ARCH from the command
line. The ARCH specified in the command line could be different from
the ARCH of the existing build in the source tree.

This could cause problems in regular kernel build and kunit workflows.

Regular workflow:

- Build x86_64 kernel
$ make ARCH=x86_64
- Try building another arch kernel out of tree with O=
$ make ARCH=um O=/linux/build
- kbuild detects source tree is unclean

   ***
   *** The source tree is not clean, please run 'make ARCH=um mrproper'
   *** in /linux/linux_srcdir
   ***

- Clean source tree as suggested by kbuild
$ make ARCH=um mrproper
- Source clean appears to be clean, but it leaves behind generated header
   files under arch/x86
arch/x86/realmode/rm/pasyms.h

A subsequent x86_64e build fails with
   "undefined symbol sev_es_trampoline_start referenced ..."

kunit workflow runs into this issue:

- Build x86_64 kernel
- Run kunit tests:  it tries to build for user specified ARCH or uml
   as default:
$ ./tools/testing/kunit/kunit.py run

- kbuild detects unclean source tree

   ***
   *** The source tree is not clean, please run 'make ARCH=um mrproper'
   *** in /linux/linux_6.15
   ***

- Clean source tree as suggested by kbuild
$ make ARCH=um mrproper
- Source clean appears to be clean, but it leaves behind generated header
   files under arch/x86

The problem shows when user tries to run tests on ARCH=x86_64:

$ ./tools/testing/kunit/kunit.py run ARCH=x86_64

"undefined symbol sev_es_trampoline_start referenced ..."

Build trips on arch/x86/realmode/rm/pasyms.h left behind by a prior
x86_64 build.

Problems related to partially cleaned source tree are hard to debug.
Change Makefile to unclean source logic to use ARCH from compile.h
UTS_MACHINE string. With this change kbuild prints:

$ ./tools/testing/kunit/kunit.py run

   ***
   *** The source tree is not clean, please run 'make ARCH=x86_64 mrproper'
   *** in /linux/linux_6.15
   ***

Signed-off-by: Shuah Khan 
---
  Makefile | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 5aa9ee52a765..7ee29136b4da 100644
--- a/Makefile
+++ b/Makefile
@@ -674,7 +674,7 @@ ifeq ($(KBUILD_EXTMOD),)
 -d $(srctree)/include/config -o \
 -d $(srctree)/arch/$(SRCARCH)/include/generated ]; then \


Would it make sense to check for include/generated as a catch all?


echo >&2 "***"; \
-   echo >&2 "*** The source tree is not clean, please run 'make$(if 
$(findstring command line, $(origin ARCH)), ARCH=$(ARCH)) mrproper'"; \
+   echo >&2 "*** The source tree is not clean, please run 'make ARCH=$(shell 
grep UTS_MACHINE $(srctree)/include/generated/compile.h | cut -d '"' -f 2) mrproper'"; \


Please 'grep' option '-s'.

There are some (rare) occassions, when there is no include/generated/compile.h
but still the source tree will be considered to be dirty:


I considered adding a check for not finding include/generated/compile.h
and figured if include/config is found we are probably safe.

I will fix that.



 grep: ../include/generated/compile.h: No such file or directory
 ***
 *** The source tree is not clean, please run 'make ARCH= mrproper'
 ...



thanks,
-- Shuah

Re: [PATCH v4 1/9] slab: add opt-in caching layer of percpu sheaves

2025-05-06 Thread Suren Baghdasaryan

On Tue, May 6, 2025 at 10:32 AM Suren Baghdasaryan  wrote:
>
> On Mon, Apr 28, 2025 at 12:01 AM Vlastimil Babka  wrote:
> >
> > On 4/25/25 19:31, Christoph Lameter (Ampere) wrote:
> > > On Fri, 25 Apr 2025, Vlastimil Babka wrote:
> > >
> > >> @@ -4195,7 +4793,11 @@ static __fastpath_inline void 
> > >> *slab_alloc_node(struct kmem_cache *s, struct list
> > >>  if (unlikely(object))
> > >>  goto out;
> > >>
> > >> -object = __slab_alloc_node(s, gfpflags, node, addr, orig_size);
> > >> +if (s->cpu_sheaves && node == NUMA_NO_NODE)
> > >> +object = alloc_from_pcs(s, gfpflags);
> > >
> > > The node to use is determined in __slab_alloc_node() only based on the
> > > memory policy etc. NUMA_NO_NODE allocations can be redirected by memory
> > > policies and this check disables it.
> >
> > To handle that, alloc_from_pcs() contains this:
> >
> > #ifdef CONFIG_NUMA
> > if (static_branch_unlikely(&strict_numa)) {
> > if (current->mempolicy)
> > return NULL;
> > }
> > #endif
> >
> > And so there will be a fallback. It doesn't (currently) try to evaluate if
> > the local node is compatible as this is before taking the local lock (and
> > thus preventing migration).
> >
> >
> > >> @@ -4653,7 +5483,10 @@ void slab_free(struct kmem_cache *s, struct slab 
> > >> *slab, void *object,
> > >>  memcg_slab_free_hook(s, slab, &object, 1);
> > >>  alloc_tagging_slab_free_hook(s, slab, &object, 1);
> > >>
> > >> -if (likely(slab_free_hook(s, object, slab_want_init_on_free(s), 
> > >> false)))
> > >> +if (unlikely(!slab_free_hook(s, object, slab_want_init_on_free(s), 
> > >> false)))
> > >> +return;
> > >> +
> > >> +if (!s->cpu_sheaves || !free_to_pcs(s, object))
> > >>  do_slab_free(s, slab, object, object, 1, addr);
> > >>  }
> > >
> > > We free to pcs even if the object is remote?
>
> Overall the patch LGTM but I would like to hear the answer to this
> question too, please.

Ah, I reached the last patch and found the answer there:
https://lore.kernel.org/all/c60ae681-6027-0626-8d4e-5833982bf...@gentwo.org/

>
> > >
> >

Re: [PATCH v4 1/9] slab: add opt-in caching layer of percpu sheaves

2025-05-06 Thread Suren Baghdasaryan

On Fri, Apr 25, 2025 at 1:27 AM Vlastimil Babka  wrote:
>
> Specifying a non-zero value for a new struct kmem_cache_args field
> sheaf_capacity will setup a caching layer of percpu arrays called
> sheaves of given capacity for the created cache.
>
> Allocations from the cache will allocate via the percpu sheaves (main or
> spare) as long as they have no NUMA node preference. Frees will also
> put the object back into one of the sheaves.
>
> When both percpu sheaves are found empty during an allocation, an empty
> sheaf may be replaced with a full one from the per-node barn. If none
> are available and the allocation is allowed to block, an empty sheaf is
> refilled from slab(s) by an internal bulk alloc operation. When both
> percpu sheaves are full during freeing, the barn can replace a full one
> with an empty one, unless over a full sheaves limit. In that case a
> sheaf is flushed to slab(s) by an internal bulk free operation. Flushing
> sheaves and barns is also wired to the existing cpu flushing and cache
> shrinking operations.
>
> The sheaves do not distinguish NUMA locality of the cached objects. If
> an allocation is requested with kmem_cache_alloc_node() (or a mempolicy
> with strict_numa mode enabled) with a specific node (not NUMA_NO_NODE),
> the sheaves are bypassed.
>
> The bulk operations exposed to slab users also try to utilize the
> sheaves as long as the necessary (full or empty) sheaves are available
> on the cpu or in the barn. Once depleted, they will fallback to bulk
> alloc/free to slabs directly to avoid double copying.
>
> The sheaf_capacity value is exported in sysfs for observability.
>
> Sysfs CONFIG_SLUB_STATS counters alloc_cpu_sheaf and free_cpu_sheaf
> count objects allocated or freed using the sheaves (and thus not
> counting towards the other alloc/free path counters). Counters
> sheaf_refill and sheaf_flush count objects filled or flushed from or to
> slab pages, and can be used to assess how effective the caching is. The
> refill and flush operations will also count towards the usual
> alloc_fastpath/slowpath, free_fastpath/slowpath and other counters for
> the backing slabs.  For barn operations, barn_get and barn_put count how
> many full sheaves were get from or put to the barn, the _fail variants
> count how many such requests could not be satisfied mainly  because the
> barn was either empty or full. While the barn also holds empty sheaves
> to make some operations easier, these are not as critical to mandate own
> counters.  Finally, there are sheaf_alloc/sheaf_free counters.
>
> Access to the percpu sheaves is protected by local_trylock() when
> potential callers include irq context, and local_lock() otherwise (such
> as when we already know the gfp flags allow blocking). The trylock
> failures should be rare and we can easily fallback. Each per-NUMA-node
> barn has a spin_lock.
>
> When slub_debug is enabled for a cache with sheaf_capacity also
> specified, the latter is ignored so that allocations and frees reach the
> slow path where debugging hooks are processed.
>
> Signed-off-by: Vlastimil Babka 

Reviewed-by: Suren Baghdasaryan 

One nit which is barely worth mentioning.

> ---
>  include/linux/slab.h |   31 ++
>  mm/slab.h|2 +
>  mm/slab_common.c |5 +-
>  mm/slub.c| 1053 
> +++---
>  4 files changed, 1044 insertions(+), 47 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 
> d5a8ab98035cf3e3d9043e3b038e1bebeff05b52..4cb495d55fc58c70a992ee4782d7990ce1c55dc6
>  100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -335,6 +335,37 @@ struct kmem_cache_args {
>  * %NULL means no constructor.
>  */
> void (*ctor)(void *);
> +   /**
> +* @sheaf_capacity: Enable sheaves of given capacity for the cache.
> +*
> +* With a non-zero value, allocations from the cache go through 
> caching
> +* arrays called sheaves. Each cpu has a main sheaf that's always
> +* present, and a spare sheaf thay may be not present. When both 
> become
> +* empty, there's an attempt to replace an empty sheaf with a full 
> sheaf
> +* from the per-node barn.
> +*
> +* When no full sheaf is available, and gfp flags allow blocking, a
> +* sheaf is allocated and filled from slab(s) using bulk allocation.
> +* Otherwise the allocation falls back to the normal operation
> +* allocating a single object from a slab.
> +*
> +* Analogically when freeing and both percpu sheaves are full, the 
> barn
> +* may replace it with an empty sheaf, unless it's over capacity. In
> +* that case a sheaf is bulk freed to slab pages.
> +*
> +* The sheaves do not enforce NUMA placement of objects, so 
> allocations
> +* via kmem_cache_alloc_node() with a node specified other than
> +* NUMA_NO_NODE will byp

Re: [PATCH v2 3/4] media: i2c: imx214: Make use of CCS PLL calculator

2025-05-06 Thread André Apitzsch

Hi Sakari,

thanks for the feedback. One question below.

Am Dienstag, dem 06.05.2025 um 08:05 + schrieb Sakari Ailus:
> Hi André,
> 
> A few more comments below.
> 
> On Mon, May 05, 2025 at 11:05:55PM +0200, André Apitzsch via B4 Relay
> wrote:
> > From: André Apitzsch 
> > 
> > Calculate PLL parameters based on clock frequency and link
> > frequency.
> > 
> > Acked-by: Ricardo Ribalda 
> > Signed-off-by: André Apitzsch 
> > ---
> >  drivers/media/i2c/Kconfig  |   1 +
> >  drivers/media/i2c/imx214.c | 216
> > +
> >  2 files changed, 178 insertions(+), 39 deletions(-)
> > 
> > diff --git a/drivers/media/i2c/Kconfig b/drivers/media/i2c/Kconfig
> > index
> > e576b213084d232e90b7e556a7a855a3bb95544c..c8e24c42e0c4ea169f1b6cdc4
> > f2631234a51c7d9 100644
> > --- a/drivers/media/i2c/Kconfig
> > +++ b/drivers/media/i2c/Kconfig
> > @@ -141,6 +141,7 @@ config VIDEO_IMX214
> >     depends on GPIOLIB
> >     select REGMAP_I2C
> >     select V4L2_CCI_I2C
> > +   select VIDEO_CCS_PLL
> >     help
> >       This is a Video4Linux2 sensor driver for the Sony
> >       IMX214 camera.
> > diff --git a/drivers/media/i2c/imx214.c
> > b/drivers/media/i2c/imx214.c
> > index
> > 3aca6ebb02d649c1b7f0b6a6049c1e3aa3d08951..9e9be47394ec768a5b34d44b0
> > 6b5bbb0988da5a1 100644
> > --- a/drivers/media/i2c/imx214.c
> > +++ b/drivers/media/i2c/imx214.c
> > @@ -20,6 +20,8 @@
> >  #include 
> >  #include 
> >  
> > +#include "ccs-pll.h"
> > +
> >  /* Chip ID */
> >  #define IMX214_REG_CHIP_ID CCI_REG16(0x0016)
> >  #define IMX214_CHIP_ID 0x0214
> > @@ -34,7 +36,6 @@
> >  #define IMX214_DEFAULT_LINK_FREQ   6
> >  /* Keep wrong link frequency for backward compatibility */
> >  #define IMX214_DEFAULT_LINK_FREQ_LEGACY48000
> > -#define IMX214_DEFAULT_PIXEL_RATE ((IMX214_DEFAULT_LINK_FREQ *
> > 8LL) / 10)
> >  #define IMX214_FPS 30
> >  
> >  /* V-TIMING internal */
> > @@ -84,6 +85,7 @@
> >  #define IMX214_CSI_DATA_FORMAT_RAW10   0x0A0A
> >  #define IMX214_CSI_DATA_FORMAT_COMP6   0x0A06
> >  #define IMX214_CSI_DATA_FORMAT_COMP8   0x0A08
> > +#define IMX214_BITS_PER_PIXEL_MASK 0xFF
> >  
> >  #define IMX214_REG_CSI_LANE_MODE   CCI_REG8(0x0114)
> >  #define IMX214_CSI_2_LANE_MODE 1
> > @@ -249,6 +251,10 @@ struct imx214 {
> >     struct clk *xclk;
> >     struct regmap *regmap;
> >  
> > +   struct ccs_pll pll;
> > +
> > +   struct v4l2_fwnode_endpoint bus_cfg;
> > +
> >     struct v4l2_subdev sd;
> >     struct media_pad pad;
> >  
> > @@ -758,16 +764,22 @@ static int imx214_configure_pll(struct imx214
> > *imx214)
> >  {
> >     int ret = 0;
> >  
> > -   cci_write(imx214->regmap, IMX214_REG_VTPXCK_DIV, 5, &ret);
> > -   cci_write(imx214->regmap, IMX214_REG_VTSYCK_DIV, 2, &ret);
> > -   cci_write(imx214->regmap, IMX214_REG_PREPLLCK_VT_DIV, 3,
> > &ret);
> > -   cci_write(imx214->regmap, IMX214_REG_PLL_VT_MPY, 150,
> > &ret);
> > -   cci_write(imx214->regmap, IMX214_REG_OPPXCK_DIV, 10,
> > &ret);
> > -   cci_write(imx214->regmap, IMX214_REG_OPSYCK_DIV, 1, &ret);
> > +   cci_write(imx214->regmap, IMX214_REG_VTPXCK_DIV,
> > +     imx214->pll.vt_bk.pix_clk_div, &ret);
> > +   cci_write(imx214->regmap, IMX214_REG_VTSYCK_DIV,
> > +     imx214->pll.vt_bk.sys_clk_div, &ret);
> > +   cci_write(imx214->regmap, IMX214_REG_PREPLLCK_VT_DIV,
> > +     imx214->pll.vt_fr.pre_pll_clk_div, &ret);
> > +   cci_write(imx214->regmap, IMX214_REG_PLL_VT_MPY,
> > +     imx214->pll.vt_fr.pll_multiplier, &ret);
> > +   cci_write(imx214->regmap, IMX214_REG_OPPXCK_DIV,
> > +     imx214->pll.op_bk.pix_clk_div, &ret);
> > +   cci_write(imx214->regmap, IMX214_REG_OPSYCK_DIV,
> > +     imx214->pll.op_bk.sys_clk_div, &ret);
> >     cci_write(imx214->regmap, IMX214_REG_PLL_MULT_DRIV,
> >       IMX214_PLL_SINGLE, &ret);
> >     cci_write(imx214->regmap, IMX214_REG_EXCK_FREQ,
> > -     IMX214_EXCK_FREQ(IMX214_DEFAULT_CLK_FREQ /
> > 100), &ret);
> > +     IMX214_EXCK_FREQ(imx214->pll.ext_clk_freq_hz /
> > 100), &ret);
> >  
> >     return ret;
> >  }
> > @@ -872,9 +884,6 @@ static const struct v4l2_ctrl_ops
> > imx214_ctrl_ops = {
> >  
> >  static int imx214_ctrls_init(struct imx214 *imx214)
> >  {
> > -   static const s64 link_freq[] = {
> > -   IMX214_DEFAULT_LINK_FREQ
> > -   };
> >     static const struct v4l2_area unit_size = {
> >     .width = 1120,
> >     .height = 1120,
> > @@ -895,15 +904,14 @@ static int imx214_ctrls_init(struct imx214
> > *imx214)
> >     if (ret)
> >     return ret;
> >  
> > -   imx214->pixel_rate = v4l2_ctrl_new_std(ctrl_hdlr, NULL,
> > -     
> > V4L2_CID_PIXEL_RATE, 0,
> > -     
> > IMX214_DEFAULT_PIXEL_RATE, 1,
> > -     
> > IMX214_DEFAULT_PIXEL_RATE);
> > +   imx214->pixel_rate =
> > +   v4l2_ctrl_new_s

Re: [PATCH v2 3/4] media: i2c: imx214: Make use of CCS PLL calculator

2025-05-06 Thread Sakari Ailus

Hi André,

On Tue, May 06, 2025 at 10:16:23PM +0200, André Apitzsch wrote:
> Hi Sakari,
> 
> thanks for the feedback. One question below.
> 
> Am Dienstag, dem 06.05.2025 um 08:05 + schrieb Sakari Ailus:
> > Hi André,
> > 
> > A few more comments below.
> > 
> > On Mon, May 05, 2025 at 11:05:55PM +0200, André Apitzsch via B4 Relay
> > wrote:
> > > From: André Apitzsch 
> > > 
> > > Calculate PLL parameters based on clock frequency and link
> > > frequency.
> > > 
> > > Acked-by: Ricardo Ribalda 
> > > Signed-off-by: André Apitzsch 
> > > ---
> > >  drivers/media/i2c/Kconfig  |   1 +
> > >  drivers/media/i2c/imx214.c | 216
> > > +
> > >  2 files changed, 178 insertions(+), 39 deletions(-)
> > > 
> > > diff --git a/drivers/media/i2c/Kconfig b/drivers/media/i2c/Kconfig
> > > index
> > > e576b213084d232e90b7e556a7a855a3bb95544c..c8e24c42e0c4ea169f1b6cdc4
> > > f2631234a51c7d9 100644
> > > --- a/drivers/media/i2c/Kconfig
> > > +++ b/drivers/media/i2c/Kconfig
> > > @@ -141,6 +141,7 @@ config VIDEO_IMX214
> > >   depends on GPIOLIB
> > >   select REGMAP_I2C
> > >   select V4L2_CCI_I2C
> > > + select VIDEO_CCS_PLL
> > >   help
> > >     This is a Video4Linux2 sensor driver for the Sony
> > >     IMX214 camera.
> > > diff --git a/drivers/media/i2c/imx214.c
> > > b/drivers/media/i2c/imx214.c
> > > index
> > > 3aca6ebb02d649c1b7f0b6a6049c1e3aa3d08951..9e9be47394ec768a5b34d44b0
> > > 6b5bbb0988da5a1 100644
> > > --- a/drivers/media/i2c/imx214.c
> > > +++ b/drivers/media/i2c/imx214.c
> > > @@ -20,6 +20,8 @@
> > >  #include 
> > >  #include 
> > >  
> > > +#include "ccs-pll.h"
> > > +
> > >  /* Chip ID */
> > >  #define IMX214_REG_CHIP_ID   CCI_REG16(0x0016)
> > >  #define IMX214_CHIP_ID   0x0214
> > > @@ -34,7 +36,6 @@
> > >  #define IMX214_DEFAULT_LINK_FREQ 6
> > >  /* Keep wrong link frequency for backward compatibility */
> > >  #define IMX214_DEFAULT_LINK_FREQ_LEGACY  48000
> > > -#define IMX214_DEFAULT_PIXEL_RATE ((IMX214_DEFAULT_LINK_FREQ *
> > > 8LL) / 10)
> > >  #define IMX214_FPS 30
> > >  
> > >  /* V-TIMING internal */
> > > @@ -84,6 +85,7 @@
> > >  #define IMX214_CSI_DATA_FORMAT_RAW10 0x0A0A
> > >  #define IMX214_CSI_DATA_FORMAT_COMP6 0x0A06
> > >  #define IMX214_CSI_DATA_FORMAT_COMP8 0x0A08
> > > +#define IMX214_BITS_PER_PIXEL_MASK   0xFF
> > >  
> > >  #define IMX214_REG_CSI_LANE_MODE CCI_REG8(0x0114)
> > >  #define IMX214_CSI_2_LANE_MODE   1
> > > @@ -249,6 +251,10 @@ struct imx214 {
> > >   struct clk *xclk;
> > >   struct regmap *regmap;
> > >  
> > > + struct ccs_pll pll;
> > > +
> > > + struct v4l2_fwnode_endpoint bus_cfg;
> > > +
> > >   struct v4l2_subdev sd;
> > >   struct media_pad pad;
> > >  
> > > @@ -758,16 +764,22 @@ static int imx214_configure_pll(struct imx214
> > > *imx214)
> > >  {
> > >   int ret = 0;
> > >  
> > > - cci_write(imx214->regmap, IMX214_REG_VTPXCK_DIV, 5, &ret);
> > > - cci_write(imx214->regmap, IMX214_REG_VTSYCK_DIV, 2, &ret);
> > > - cci_write(imx214->regmap, IMX214_REG_PREPLLCK_VT_DIV, 3,
> > > &ret);
> > > - cci_write(imx214->regmap, IMX214_REG_PLL_VT_MPY, 150,
> > > &ret);
> > > - cci_write(imx214->regmap, IMX214_REG_OPPXCK_DIV, 10,
> > > &ret);
> > > - cci_write(imx214->regmap, IMX214_REG_OPSYCK_DIV, 1, &ret);
> > > + cci_write(imx214->regmap, IMX214_REG_VTPXCK_DIV,
> > > +   imx214->pll.vt_bk.pix_clk_div, &ret);
> > > + cci_write(imx214->regmap, IMX214_REG_VTSYCK_DIV,
> > > +   imx214->pll.vt_bk.sys_clk_div, &ret);
> > > + cci_write(imx214->regmap, IMX214_REG_PREPLLCK_VT_DIV,
> > > +   imx214->pll.vt_fr.pre_pll_clk_div, &ret);
> > > + cci_write(imx214->regmap, IMX214_REG_PLL_VT_MPY,
> > > +   imx214->pll.vt_fr.pll_multiplier, &ret);
> > > + cci_write(imx214->regmap, IMX214_REG_OPPXCK_DIV,
> > > +   imx214->pll.op_bk.pix_clk_div, &ret);
> > > + cci_write(imx214->regmap, IMX214_REG_OPSYCK_DIV,
> > > +   imx214->pll.op_bk.sys_clk_div, &ret);
> > >   cci_write(imx214->regmap, IMX214_REG_PLL_MULT_DRIV,
> > >     IMX214_PLL_SINGLE, &ret);
> > >   cci_write(imx214->regmap, IMX214_REG_EXCK_FREQ,
> > > -   IMX214_EXCK_FREQ(IMX214_DEFAULT_CLK_FREQ /
> > > 100), &ret);
> > > +   IMX214_EXCK_FREQ(imx214->pll.ext_clk_freq_hz /
> > > 100), &ret);
> > >  
> > >   return ret;
> > >  }
> > > @@ -872,9 +884,6 @@ static const struct v4l2_ctrl_ops
> > > imx214_ctrl_ops = {
> > >  
> > >  static int imx214_ctrls_init(struct imx214 *imx214)
> > >  {
> > > - static const s64 link_freq[] = {
> > > - IMX214_DEFAULT_LINK_FREQ
> > > - };
> > >   static const struct v4l2_area unit_size = {
> > >   .width = 1120,
> > >   .height = 1120,
> > > @@ -895,15 +904,14 @@ static int imx214_ctrls_init(struct imx214
> > > *imx214)
> > >   if (ret)
> > >   return ret;
> > >  
> > > - imx214->pixel_rate = v4l2_ctrl_new_std(ctrl_hdlr, NULL,
> > > -   
>

Re: [PATCH v3 3/3] selftests: pidfd: add tests for PIDFD_SELF_*

2025-05-06 Thread Shuah Khan


On 5/1/25 05:42, Peter Zijlstra wrote:

On Wed, Oct 16, 2024 at 07:14:34PM -0700, John Hubbard wrote:

On 10/16/24 3:06 PM, Lorenzo Stoakes wrote:

On Wed, Oct 16, 2024 at 02:00:27PM -0600, Shuah Khan wrote:

On 10/16/24 04:20, Lorenzo Stoakes wrote:

...

diff --git a/tools/testing/selftests/pidfd/pidfd.h 
b/tools/testing/selftests/pidfd/pidfd.h
index 88d6830ee004..1640b711889b 100644
--- a/tools/testing/selftests/pidfd/pidfd.h
+++ b/tools/testing/selftests/pidfd/pidfd.h
@@ -50,6 +50,14 @@
#define PIDFD_NONBLOCK O_NONBLOCK
#endif
+/* System header file may not have this available. */
+#ifndef PIDFD_SELF_THREAD
+#define PIDFD_SELF_THREAD -100
+#endif
+#ifndef PIDFD_SELF_THREAD_GROUP
+#define PIDFD_SELF_THREAD_GROUP -200
+#endif
+


As mentioned in my response to v1 patch:

kselftest has dependency on "make headers" and tests include
headers from linux/ directory


Right but that assumes you install the kernel headers on the build system,
which is quite a painful thing to have to do when you are quickly iterating
on a qemu setup.

This is a use case I use all the time so not at all theoretical.



This is turning out to be a fairly typical reaction from kernel
developers, when presented with the "you must first run make headers"
requirement for kselftests.

Peter Zijlstra's "NAK NAK NAK" response [1] last year was the most
colorful, so I'll helpfully cite it here. :)


Let me re-try this.

This is driving me insane. I've spend the past _TWO_ days trying to
build KVM selftests and I'm still failing.

This is absolute atrocious crap and is costing me valuable time.

Please fix this fucking selftests shit to just build. This is unusable
garbage.


I don't recall all the reasons why kselftests needed "make headers"
One reason I could think of was that when a new test depends on a
header change, the test won't build unless headers are installed.

If this requirement is causing problems for tests that don't fall
into the category and we probably have more of them mow, we can
clean that up.

John, you mentioned you got mm tests working without headers?
Can you share the commit here.

thanks,
-- Shuah

74 matches

Mail list logo