date:20240618

Re: [PATCH] riscv: Fix early ftrace nop patching

2024-06-18 Thread Andy Chiu

On Tue, Jun 18, 2024 at 9:40 PM Alexandre Ghiti  wrote:
>
> Hi Andy,
>
> On Tue, Jun 18, 2024 at 2:48 PM Andy Chiu  wrote:
> >
> > On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti  wrote:
> > >
> > > Hi Conor,
> > >
> > > On 17/06/2024 15:23, Alexandre Ghiti wrote:
> > > > Hi Conor,
> > > >
> > > > Sorry for the delay here.
> > > >
> > > > On 13/06/2024 09:48, Conor Dooley wrote:
> > > >> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
> > > >>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> > > >>> converted ftrace_make_nop() to use patch_insn_write() which does not
> > > >>> emit any icache flush relying entirely on __ftrace_modify_code() to do
> > > >>> that.
> > > >>>
> > > >>> But we missed that ftrace_make_nop() was called very early directly
> > > >>> when
> > > >>> converting mcount calls into nops (actually on riscv it converts 2B
> > > >>> nops
> > > >>> emitted by the compiler into 4B nops).
> > > >>>
> > > >>> This caused crashes on multiple HW as reported by Conor and Björn 
> > > >>> since
> > > >>> the booting core could have half-patched instructions in its icache
> > > >>> which would trigger an illegal instruction trap: fix this by emitting 
> > > >>> a
> > > >>> local flush icache when early patching nops.
> > > >>>
> > > >>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
> > > >>> Signed-off-by: Alexandre Ghiti 
> > > >>> ---
> > > >>>   arch/riscv/include/asm/cacheflush.h | 6 ++
> > > >>>   arch/riscv/kernel/ftrace.c  | 3 +++
> > > >>>   2 files changed, 9 insertions(+)
> > > >>>
> > > >>> diff --git a/arch/riscv/include/asm/cacheflush.h
> > > >>> b/arch/riscv/include/asm/cacheflush.h
> > > >>> index dd8d07146116..ce79c558a4c8 100644
> > > >>> --- a/arch/riscv/include/asm/cacheflush.h
> > > >>> +++ b/arch/riscv/include/asm/cacheflush.h
> > > >>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
> > > >>>   asm volatile ("fence.i" ::: "memory");
> > > >>>   }
> > > >>>   +static inline void local_flush_icache_range(unsigned long start,
> > > >>> +unsigned long end)
> > > >>> +{
> > > >>> +local_flush_icache_all();
> > > >>> +}
> > > >>> +
> > > >>>   #define PG_dcache_clean PG_arch_1
> > > >>> static inline void flush_dcache_folio(struct folio *folio)
> > > >>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > > >>> index 4f4987a6d83d..32e7c401dfb4 100644
> > > >>> --- a/arch/riscv/kernel/ftrace.c
> > > >>> +++ b/arch/riscv/kernel/ftrace.c
> > > >>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct
> > > >>> dyn_ftrace *rec)
> > > >>>   out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > > >>>   mutex_unlock(_mutex);
> > > >> So, turns out that this patch is not sufficient. I've seen some more
> > > >> crashes, seemingly due to initialising nops on this mutex_unlock().
> > > >> Palmer suggested moving the if (!mod) ... into the lock, which fixed
> > > >> the problem for me.
> > > >
> > > >
> > > > Ok, it makes sense, I completely missed that. I'll send a fix for that
> > > > shortly so that it can be merged in rc5.
> > > >
> > > > Thanks,
> > > >
> > > > Alex
> > >
> > >
> > > So I digged a bit more and I'm afraid that the only way to make sure
> > > this issue does not happen elsewhere is to flush the icache right after
> > > the patching. We actually can't wait to batch the icache flush since
> > > along the way, we may call a function that has just been patched (the
> > > issue that you encountered here).
> >
> > Trying to provide my thoughts, please let me know if I missed
> > anything. I think what Conor suggested is safe for init_nop, as it
> > would be called only when there is only one core (booting) and at the
> > loading stage of kernel modules. In the first case we just have to
> > make sure there is no patchable entry before the core executes
> > fence.i. The second case is unconditionally safe because there is no
> > read-side of the race.
> >
> > It is a bit tricky for the generic (runtime) case of ftrace code
> > patching, but that is not because of the batch fence.i maintenance. As
> > long as there exists a patchable entry for the stopping thread, it is
> > possible for them to execute on a partially patched instruction. A
> > solution for this is again to prevent any patchable entry in the
> > stop_machine's stopping thread. Another solution is to apply the
> > atomic ftrace patching series which aims to get rid of the race.
>
> Yeah but my worry is that we can't make sure that functions called
> between the patching and the fence.i are not the ones that just get
> patched. At least today, patch_insn_write() etc should be marked as

IIUC, the compiler does not generate a patchable entry for
patch_insn_write, and so do all functions in patch.c. The entire file
is not compiled with patchable entry because the flag is removed in
riscv's Makefile. Please let me know if I misunderstand it.

> noinstr. But

Re: [PATCH v15 05/14] x86/sgx: Implement basic EPC misc cgroup functionality

2024-06-18 Thread Huang, Kai

On Tue, 2024-06-18 at 18:23 -0500, Haitao Huang wrote:
> On Tue, 18 Jun 2024 18:15:37 -0500, Huang, Kai  wrote:
> 
> > On Tue, 2024-06-18 at 07:56 -0500, Haitao Huang wrote:
> > > On Tue, 18 Jun 2024 06:31:09 -0500, Huang, Kai   
> > > wrote:
> > > 
> > > > 
> > > > > @@ -921,7 +956,8 @@ static int __init sgx_init(void)
> > > > >   if (!sgx_page_cache_init())
> > > > >   return -ENOMEM;
> > > > > 
> > > > > - if (!sgx_page_reclaimer_init()) {
> > > > > + if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) {
> > > > > + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
> > > > >   ret = -ENOMEM;
> > > > >   goto err_page_cache;
> > > > >   }
> > > > 
> > > > This code change is wrong due to two reasons:
> > > > 
> > > > 1) If sgx_page_reclaimer_init() was successful, but sgx_cgroup_init()
> > > > failed, you actually need to 'goto err_kthread' because the ksgxd()
> > > > kernel
> > > > thread is already created and is running.
> > > > 
> > > > 2) There are other cases after here that can also result in  
> > > sgx_init() to
> > > > fail completely, e.g., registering sgx_dev_provision mics device.  We
> > > > need
> > > > to reset the capacity to 0 for those cases as well.
> > > > 
> > > > AFAICT, you need something like:
> > > > 
> > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c
> > > > b/arch/x86/kernel/cpu/sgx/main.c
> > > > index 27892e57c4ef..46f9c26992a7 100644
> > > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > > @@ -930,6 +930,10 @@ static int __init sgx_init(void)
> > > > if (ret)
> > > > goto err_kthread;
> > > > +   ret = sgx_cgroup_init();
> > > > +   if (ret)
> > > > +   goto err_provision;
> > > > +
> > > > /*
> > > >  * Always try to initialize the native *and* KVM drivers.
> > > >  * The KVM driver is less picky than the native one and
> > > > @@ -941,10 +945,12 @@ static int __init sgx_init(void)
> > > > ret = sgx_drv_init();
> > > >if (sgx_vepc_init() && ret)
> > > > -   goto err_provision;
> > > > +   goto err_cgroup;
> > > >return 0;
> > > > +err_cgroup:
> > > > +   /* SGX EPC cgroup cleanup */
> > > >  err_provision:
> > > > misc_deregister(_dev_provision);
> > > > @@ -952,6 +958,8 @@ static int __init sgx_init(void)
> > > > kthread_stop(ksgxd_tsk);
> > > > err_page_cache:
> > > > +   misc_misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
> > > > +
> > > > for (i = 0; i < sgx_nr_epc_sections; i++) {
> > > > vfree(sgx_epc_sections[i].pages);
> > > > memunmap(sgx_epc_sections[i].virt_addr);
> > > > 
> > > > 
> > > > I put the sgx_cgroup_init() before sgx_drv_init() and sgx_vepc_init(),
> > > > otherwise you will need sgx_drv_cleanup() and sgx_vepc_cleanup()
> > > > respectively when sgx_cgroup_init() fails.
> > > > 
> > > 
> > > Yes, good catch.
> > > 
> > > > This looks a little bit weird too, though:
> > > > 
> > > > Calling misc_misc_cg_set_capacity() to reset capacity to 0 is done at  
> > > end
> > > > of sgx_init() error path, because the "set capacity" part is done in
> > > > sgx_epc_cache_init().
> > > > But logically, both "set capacity" and "reset capacity to 0" should be
> > > > SGX
> > > > EPC cgroup operation, so it's more reasonable to do "set capacity" in
> > > > sgx_cgroup_init() and do "reset to 0" in the
> > > > 
> > > > /* SGX EPC cgroup cleanup */
> > > > 
> > > > as shown above.
> > > > 
> > > > Eventually, you will need to do EPC cgroup cleanup anyway, e.g., to  
> > > free
> > > > the workqueue, so it's odd to have two places to handle EPC cgroup
> > > > cleanup.
> > > > 
> > > > I understand the reason "set capacity" part is done in
> > > > sgx_page_cache_init() now is because in that function you can easily  
> > > get
> > > > the capacity.  But the fact is @sgx_numa_nodes also tracks EPC size  
> > > for
> > > > each node, so you can also get the total EPC size from @sgx_numa_node  
> > > in
> > > > sgx_cgroup_init() and set capacity there.
> > > > 
> > > > In this case, you can put "reset capacity to 0" and "free workqueue"
> > > > together as the "SGX EPC cgroup cleanup", which is way more clear  
> > > IMHO.
> > > > 
> > > Okay, will  expose @sgx_numa_nodes to epc_cgroup.c and do the  
> > > calculations
> > > in sgx_cgroup_init().
> > > 
> > 
> > Looks you will also need to expose @sgx_numa_mask, which looks overkill.
> > 
> > Other options:
> > 
> > 1) Expose a function to return total EPC pages/size in "sgx.h".
> > 
> > 2) Move out the new 'capacity' variable in this patch as a global  
> > variable
> > and expose it in "sgx.h" (perhaps rename to 'sgx_total_epc_pages/size').
> > 
> > 3) Make sgx_cgroup_init() to take an argument of total EPC pages/size,  
> > and
> > pass it in sgx_init().  
> > For 3) there are also options to get total EPC pages/size:
> > 
> > a) Move

Re: [PATCH v2] net: missing check virtio

2024-06-18 Thread Jakub Kicinski

On Fri, 14 Jun 2024 13:18:26 +0300 Denis Arefev wrote:
> Yeah, I was thinking of adding Fixes: 
> 
> But this code is new, it complements what is done.
> 1. check (!(ret && (hdr->gso_size > needed) &&
>((remainder > needed) || (remainder == 0 
>  complements comit 0f6925b3e8da0
> 
> 2. The setting of the SKBFL_SHARED_FRAG flag can be associated with this 
> comit cef401de7be8c.
> In the skb_checksum_help function, a check for skb_has_shared_frag has been 
> added.
> If the flag is not set, the skb buffer will remain non-linear, which is not 
> good. 

The Fixes tag indicates how far back the bug would trigger.

Re: [PATCH V1] rpmsg: glink: Make glink smem interrupt wakeup capable

2024-06-18 Thread Bjorn Andersson

On Thu, Jun 13, 2024 at 04:05:17PM +0530, Deepak Kumar Singh wrote:
> 
> 
> On 6/3/2024 3:07 PM, Caleb Connolly wrote:
> > Hi Deepak,
> > 
> > On 03/06/2024 09:36, Deepak Kumar Singh wrote:
> > > There are certain usecases which require glink interrupt to be
> > > wakeup capable. For example if handset is in sleep state and
> > > usb charger is plugged in, dsp wakes up and sends glink interrupt
> > > to host for glink pmic channel communication. Glink is suppose to
> > > wakeup host processor completely for further glink data handling.
> > > IRQF_NO_SUSPEND does not gurantee complete wakeup, system may again
> > > enter sleep after interrupt handling and glink data may not be
> > > handled by pmic client driver.
> > > 
> > > To ensure data handling by client configure glink smem device as
> > > wakeup source and attach glink interrupt as wakeup irq. Remove
> > > IRQF_NO_SUSPEND flag as it is no longer required.
> > 
> > I'm not sure I agree with this approach, glink is used for lots of
> > things -- like QRTR, where the sensor DSP and modem may also need to
> > wake the system up (e.g. for "wake on pickup" on mobile, or for incoming
> > calls/sms).
> > 
> > Configuring this to always wake up the system fully will result in a lot
> > of spurious wakeups for arbitrary modem notifications (e.g. signal
> > strength changes) if userspace hasn't properly configured these
> > (something ModemManager currently lacks support for).
> 
> In internal testing at least we don't see such issues, may be downstream
> modem manager is configuring things properly.

As we discussed during the introduction of 1a561c521ba9 ("soc: qcom:
smp2p: Add wakeup capability to SMP2P IRQ"), we don't want a laptop-like
device to wake up in someones backpack and overheat.

If there are gaps in upstream ModemManager it would be desirable to see
those closed, but it seems likely that we have other services doing
similar things?

> Also with devices having
> proper auto suspend feature this change may not be affecting power numbers
> significantly.

There are many types of products where you don't have auto suspend.

> 
> Additionally my understanding is by definition glink interrupt should be
> wakeup capable. May be Bjorn can comment more on this.
> 

That sounds correct, but it was made under the assumption that the apps
software does auto suspend.

Regards,
Bjorn

> Thanks,
> Deepak
> > 
> > IRQF_NO_SUSPEND is presumably necessary to keep the DSPs happy? iirc
> > downstream Qualcomm kernels have historically taken this approach to
> > avoid spurious wakeups.
> > 
> > I proposed an alternative approach some time back that would allow the
> > wakeup to be configured on a per-channel basis.
> > 
> > https://lore.kernel.org/linux-arm-msm/20230117142414.983946-1-caleb.conno...@linaro.org/
> > 
> > Back then Bjorn proposed using some socket specific mechanism to handle
> > this for QRTR, but given this is now a common issue for multiple glink
> > channels, maybe it's something we could revisit.
> > 
> > Requiring the wakeup be enabled by userspace clearly doesn't make sense
> > for your proposed usecase, perhaps there's a way to configure this on a
> > per-channel basis in-kernel (maybe as the rpmsg API?).
> > 
> > Thanks and regards,
> > > 
> > > Signed-off-by: Deepak Kumar Singh 
> > > ---
> > >   drivers/rpmsg/qcom_glink_smem.c | 8 ++--
> > >   1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/rpmsg/qcom_glink_smem.c
> > > b/drivers/rpmsg/qcom_glink_smem.c
> > > index 7a982c60a8dd..f1b553efab13 100644
> > > --- a/drivers/rpmsg/qcom_glink_smem.c
> > > +++ b/drivers/rpmsg/qcom_glink_smem.c
> > > @@ -22,6 +22,7 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > >   #include 
> > > @@ -306,8 +307,7 @@ struct qcom_glink_smem
> > > *qcom_glink_smem_register(struct device *parent,
> > >   smem->irq = of_irq_get(smem->dev.of_node, 0);
> > >   ret = devm_request_irq(>dev, smem->irq, qcom_glink_smem_intr,
> > > -   IRQF_NO_SUSPEND | IRQF_NO_AUTOEN,
> > > -   "glink-smem", smem);
> > > +   IRQF_NO_AUTOEN, "glink-smem", smem);
> > >   if (ret) {
> > >   dev_err(>dev, "failed to request IRQ\n");
> > >   goto err_put_dev;
> > > @@ -346,6 +346,8 @@ struct qcom_glink_smem
> > > *qcom_glink_smem_register(struct device *parent,
> > >   smem->glink = glink;
> > > +    device_init_wakeup(dev, true);
> > > +    dev_pm_set_wake_irq(dev, smem->irq);
> > >   enable_irq(smem->irq);
> > >   return smem;
> > > @@ -365,6 +367,8 @@ void qcom_glink_smem_unregister(struct
> > > qcom_glink_smem *smem)
> > >   struct qcom_glink *glink = smem->glink;
> > >   disable_irq(smem->irq);
> > > +    dev_pm_clear_wake_irq(>dev);
> > > +    device_init_wakeup(>dev, false);
> > >   qcom_glink_native_remove(glink);
> >

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Luis Chamberlain

On Tue, Jun 18, 2024 at 02:19:47PM -0700, Sami Tolvanen wrote:
> Hi Luis,
> 
> On Tue, Jun 18, 2024 at 12:42:51PM -0700, Luis Chamberlain wrote:
> > a) Ensure correctness for all users / tools, so that proper plumbing is
> >really done. By considering all symbols you increase your scope of
> >awareness of anything that could really break.
> > 
> > b) Remove the "Rust" nature about this
> > 
> > c) Rust modules just becomes a *user* of this approach
> 
> I believe the only Rust nature here is the last patch that enables
> gendwarfksyms only for Rust. Otherwise, there shouldn't be anything
> strictly Rust specific.

Right now the check for length is generic, and assumes a hash may exist
without considering that check is moot for non-rust modules. So the
inverse is true, but doesn't provide a solution or proper architecture
for it.

> > It gets me wondering, given Kris is also working on allowing traces to
> > map symbols to module names, how does this fit into that world [0]?
> 
> AFAIK long symbol names are only a problem for struct modversion_info,
> which uses a relatively short name buffer, so I'm hoping other efforts
> won't be affected.

We'll see!

> > As for a) the reason I'm thinking about having the ability to test a
> > full real kernel and moules with this is, without that, how are you sure
> > you have the full scope of the changes needed?
> 
> Sure, I can look into hooking this up for the entire kernel and seeing
> what breaks, in addition the issues Masahiro pointed out, of course.

:) should be fun!

I think once its revised as a "generic" strategy and not a Rust one, the
proper architecture will be considered. Right now, it just looks like a
cheap band aid for Rust.

  Luis

Re: [PATCH v15 05/14] x86/sgx: Implement basic EPC misc cgroup functionality

2024-06-18 Thread Haitao Huang


On Tue, 18 Jun 2024 18:15:37 -0500, Huang, Kai  wrote:


On Tue, 2024-06-18 at 07:56 -0500, Haitao Huang wrote:
On Tue, 18 Jun 2024 06:31:09 -0500, Huang, Kai   
wrote:


>
> > @@ -921,7 +956,8 @@ static int __init sgx_init(void)
> >   if (!sgx_page_cache_init())
> >   return -ENOMEM;
> >
> > - if (!sgx_page_reclaimer_init()) {
> > + if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) {
> > + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
> >   ret = -ENOMEM;
> >   goto err_page_cache;
> >   }
>
> This code change is wrong due to two reasons:
>
> 1) If sgx_page_reclaimer_init() was successful, but sgx_cgroup_init()
> failed, you actually need to 'goto err_kthread' because the ksgxd()
> kernel
> thread is already created and is running.
>
> 2) There are other cases after here that can also result in  
sgx_init() to

> fail completely, e.g., registering sgx_dev_provision mics device.  We
> need
> to reset the capacity to 0 for those cases as well.
>
> AFAICT, you need something like:
>
> diff --git a/arch/x86/kernel/cpu/sgx/main.c
> b/arch/x86/kernel/cpu/sgx/main.c
> index 27892e57c4ef..46f9c26992a7 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -930,6 +930,10 @@ static int __init sgx_init(void)
> if (ret)
> goto err_kthread;
> +   ret = sgx_cgroup_init();
> +   if (ret)
> +   goto err_provision;
> +
> /*
>  * Always try to initialize the native *and* KVM drivers.
>  * The KVM driver is less picky than the native one and
> @@ -941,10 +945,12 @@ static int __init sgx_init(void)
> ret = sgx_drv_init();
>if (sgx_vepc_init() && ret)
> -   goto err_provision;
> +   goto err_cgroup;
>return 0;
> +err_cgroup:
> +   /* SGX EPC cgroup cleanup */
>  err_provision:
> misc_deregister(_dev_provision);
> @@ -952,6 +958,8 @@ static int __init sgx_init(void)
> kthread_stop(ksgxd_tsk);
> err_page_cache:
> +   misc_misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
> +
> for (i = 0; i < sgx_nr_epc_sections; i++) {
> vfree(sgx_epc_sections[i].pages);
> memunmap(sgx_epc_sections[i].virt_addr);
>
>
> I put the sgx_cgroup_init() before sgx_drv_init() and sgx_vepc_init(),
> otherwise you will need sgx_drv_cleanup() and sgx_vepc_cleanup()
> respectively when sgx_cgroup_init() fails.
>

Yes, good catch.

> This looks a little bit weird too, though:
>
> Calling misc_misc_cg_set_capacity() to reset capacity to 0 is done at  
end

> of sgx_init() error path, because the "set capacity" part is done in
> sgx_epc_cache_init().
> But logically, both "set capacity" and "reset capacity to 0" should be
> SGX
> EPC cgroup operation, so it's more reasonable to do "set capacity" in
> sgx_cgroup_init() and do "reset to 0" in the
>
>/* SGX EPC cgroup cleanup */
>
> as shown above.
>
> Eventually, you will need to do EPC cgroup cleanup anyway, e.g., to  
free

> the workqueue, so it's odd to have two places to handle EPC cgroup
> cleanup.
>
> I understand the reason "set capacity" part is done in
> sgx_page_cache_init() now is because in that function you can easily  
get
> the capacity.  But the fact is @sgx_numa_nodes also tracks EPC size  
for
> each node, so you can also get the total EPC size from @sgx_numa_node  
in

> sgx_cgroup_init() and set capacity there.
>
> In this case, you can put "reset capacity to 0" and "free workqueue"
> together as the "SGX EPC cgroup cleanup", which is way more clear  
IMHO.

>
Okay, will  expose @sgx_numa_nodes to epc_cgroup.c and do the  
calculations

in sgx_cgroup_init().



Looks you will also need to expose @sgx_numa_mask, which looks overkill.

Other options:

1) Expose a function to return total EPC pages/size in "sgx.h".

2) Move out the new 'capacity' variable in this patch as a global  
variable

and expose it in "sgx.h" (perhaps rename to 'sgx_total_epc_pages/size').

3) Make sgx_cgroup_init() to take an argument of total EPC pages/size,  
and
pass it in sgx_init().  
For 3) there are also options to get total EPC pages/size:


a) Move out the new 'capacity' variable in this patch as a static.

b) Add a function to calculate total EPC pages/size from sgx_numa_nodes.

Hmm.. I think we can just use option 2)?



I was  about doing this in sgx_cgroup_init():
for (i = 0; i < num_possible_nodes(); i++)
capacity += sgx_numa_nodes[i].size;

any concern using num_possible_nodes()?

I think case handled in sgx_page_cache_init() for a node with no epc (or  
mask). Only requirement is sgx_cgroup_init() called after  
sgx_page_cache_init().

Haitao

Re: [PATCH v15 05/14] x86/sgx: Implement basic EPC misc cgroup functionality

2024-06-18 Thread Huang, Kai

On Tue, 2024-06-18 at 07:56 -0500, Haitao Huang wrote:
> On Tue, 18 Jun 2024 06:31:09 -0500, Huang, Kai  wrote:
> 
> > 
> > > @@ -921,7 +956,8 @@ static int __init sgx_init(void)
> > >   if (!sgx_page_cache_init())
> > >   return -ENOMEM;
> > > 
> > > - if (!sgx_page_reclaimer_init()) {
> > > + if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) {
> > > + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
> > >   ret = -ENOMEM;
> > >   goto err_page_cache;
> > >   }
> > 
> > This code change is wrong due to two reasons:
> > 
> > 1) If sgx_page_reclaimer_init() was successful, but sgx_cgroup_init()
> > failed, you actually need to 'goto err_kthread' because the ksgxd()  
> > kernel
> > thread is already created and is running.
> > 
> > 2) There are other cases after here that can also result in sgx_init() to
> > fail completely, e.g., registering sgx_dev_provision mics device.  We  
> > need
> > to reset the capacity to 0 for those cases as well.
> > 
> > AFAICT, you need something like:
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c
> > b/arch/x86/kernel/cpu/sgx/main.c
> > index 27892e57c4ef..46f9c26992a7 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -930,6 +930,10 @@ static int __init sgx_init(void)
> > if (ret)
> > goto err_kthread;
> > +   ret = sgx_cgroup_init();
> > +   if (ret)
> > +   goto err_provision;
> > +
> > /*
> >  * Always try to initialize the native *and* KVM drivers.
> >  * The KVM driver is less picky than the native one and
> > @@ -941,10 +945,12 @@ static int __init sgx_init(void)
> > ret = sgx_drv_init();
> >if (sgx_vepc_init() && ret)
> > -   goto err_provision;
> > +   goto err_cgroup;
> >return 0;
> > +err_cgroup:
> > +   /* SGX EPC cgroup cleanup */
> >  err_provision:
> > misc_deregister(_dev_provision);
> > @@ -952,6 +958,8 @@ static int __init sgx_init(void)
> > kthread_stop(ksgxd_tsk);
> > err_page_cache:
> > +   misc_misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
> > +
> > for (i = 0; i < sgx_nr_epc_sections; i++) {
> > vfree(sgx_epc_sections[i].pages);
> > memunmap(sgx_epc_sections[i].virt_addr);
> > 
> > 
> > I put the sgx_cgroup_init() before sgx_drv_init() and sgx_vepc_init(),
> > otherwise you will need sgx_drv_cleanup() and sgx_vepc_cleanup()
> > respectively when sgx_cgroup_init() fails.
> > 
> 
> Yes, good catch.
> 
> > This looks a little bit weird too, though:
> > 
> > Calling misc_misc_cg_set_capacity() to reset capacity to 0 is done at end
> > of sgx_init() error path, because the "set capacity" part is done in
> > sgx_epc_cache_init().  
> > But logically, both "set capacity" and "reset capacity to 0" should be  
> > SGX
> > EPC cgroup operation, so it's more reasonable to do "set capacity" in
> > sgx_cgroup_init() and do "reset to 0" in the
> > 
> > /* SGX EPC cgroup cleanup */
> > 
> > as shown above.
> > 
> > Eventually, you will need to do EPC cgroup cleanup anyway, e.g., to free
> > the workqueue, so it's odd to have two places to handle EPC cgroup
> > cleanup.
> > 
> > I understand the reason "set capacity" part is done in
> > sgx_page_cache_init() now is because in that function you can easily get
> > the capacity.  But the fact is @sgx_numa_nodes also tracks EPC size for
> > each node, so you can also get the total EPC size from @sgx_numa_node in
> > sgx_cgroup_init() and set capacity there.
> > 
> > In this case, you can put "reset capacity to 0" and "free workqueue"
> > together as the "SGX EPC cgroup cleanup", which is way more clear IMHO.
> > 
> Okay, will  expose @sgx_numa_nodes to epc_cgroup.c and do the calculations  
> in sgx_cgroup_init().
> 

Looks you will also need to expose @sgx_numa_mask, which looks overkill.

Other options:

1) Expose a function to return total EPC pages/size in "sgx.h".

2) Move out the new 'capacity' variable in this patch as a global variable
and expose it in "sgx.h" (perhaps rename to 'sgx_total_epc_pages/size').

3) Make sgx_cgroup_init() to take an argument of total EPC pages/size, and
pass it in sgx_init().  

For 3) there are also options to get total EPC pages/size:

a) Move out the new 'capacity' variable in this patch as a static.

b) Add a function to calculate total EPC pages/size from sgx_numa_nodes.

Hmm.. I think we can just use option 2)?

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Sami Tolvanen

Hi Luis,

On Tue, Jun 18, 2024 at 12:42:51PM -0700, Luis Chamberlain wrote:
> On Mon, Jun 17, 2024 at 05:58:19PM +, Sami Tolvanen wrote:
> > The first 12 patches of this series add a small tool for computing
> > symbol versions from DWARF, called gendwarfksyms. When passed a list
> > of exported symbols, the tool generates an expanded type string
> > for each symbol, and computes symbol CRCs similarly to genksyms.
> 
> So this is too word centric Rust, let's think about this generically.
> We still ahve a symbol limitation even in the C world then, and this
> solution can solve that problem also for other reasons for *whatever*
> reason we devise to-come-up-with-in-the-future for augmenting symbols.
> Today Rust, tomorrow, who knows.

If you're referring to the symbol hashing in the __versions table,
that's not specific to Rust. Rust just happens to be the only source of
long symbol names right now.

> > gendwarfksyms is written in C and uses libdw to process DWARF, mainly
> > because of the existing support for C host tools that use elfutils
> > (e.g., objtool).
> 
> I agree with Masahiro, that testing this with vmlinux would be eye
> opening to what challenges we really have ahead. So, to help with this
> let's try to re-think about this  from another perspective.
>
> Yes, old userspace should not break, but you can add yet another option
> to let you opt-in to a new world order of how these crc are mapped to
> hashed repersentations of the symbols. This would allow us to:

We did run into an issue with depmod in our testing, where it needs to
be taught about hashed names to avoid 'unknown symbol' warnings. I'm not
sure if this is acceptable, so I would like to hear feedback about the
viability of the hashing scheme in general.

If old userspace can't have any regressions because of long symbol
names, I suspect we'll have to go back to omitting long symbols from
struct modversion_info and adding a v2 of the __versions table with no
symbol name length limitations. Happy to hear your thoughts.

> a) Ensure correctness for all users / tools, so that proper plumbing is
>really done. By considering all symbols you increase your scope of
>awareness of anything that could really break.
> 
> b) Remove the "Rust" nature about this
> 
> c) Rust modules just becomes a *user* of this approach

I believe the only Rust nature here is the last patch that enables
gendwarfksyms only for Rust. Otherwise, there shouldn't be anything
strictly Rust specific.

> It gets me wondering, given Kris is also working on allowing traces to
> map symbols to module names, how does this fit into that world [0]?

AFAIK long symbol names are only a problem for struct modversion_info,
which uses a relatively short name buffer, so I'm hoping other efforts
won't be affected.

> As for a) the reason I'm thinking about having the ability to test a
> full real kernel and moules with this is, without that, how are you sure
> you have the full scope of the changes needed?

Sure, I can look into hooking this up for the entire kernel and seeing
what breaks, in addition the issues Masahiro pointed out, of course.

Sami

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Sami Tolvanen

On Wed, Jun 19, 2024 at 04:03:45AM +0900, Masahiro Yamada wrote:
> On Wed, Jun 19, 2024 at 2:18 AM Greg Kroah-Hartman
>  wrote:
> >
> > On Wed, Jun 19, 2024 at 01:50:36AM +0900, Masahiro Yamada wrote:
> > > On Wed, Jun 19, 2024 at 1:44 AM Greg Kroah-Hartman
> > >  wrote:
> > > >
> > > > That's cool, can the C code be switched to also use this?  That way we
> > > > only have one path/code for all of this?
> > >
> > >
> > > As the description says, it requires CONFIG_DEBUG_INFO.
> > > We can strip the debug info from the final vmlinux, but
> > > I guess the build speed will be even slower than the current genksyms.
> >
> > For people who want genksyms (i.e. distros), don't they normally already
> > enable DEBUG_INFO as well?  The problems of genksyms are well known and
> > a pain (I speak from experience), so replacing it with info based on
> > DWARF would be great, I'll gladly trade off the DEBUG_INFO issue for
> > stablilty!
> >
> > thanks,
> >
> > greg k-h
> >
> 
> 
> 
> I do not think gendwarfksyms is a drop-in replacement,
> because it relies on libelf and libdw, which will not
> work with LLVM bitcode when CONFIG_LTO_CLANG=y.
> 
> His "Let's postpone this until final linking" stuff will
> come back?
> Then, vmlinux.o is processed to extract the CRC
> of all symbols?

I agree, this won't work with LTO unless we process vmlinux.o.

> In my benchmark, this tool took 3.84 sec just for processing
> a single rust/core.o object.

To be fair, Rust currently exports all globals and core.o has 400
exported symbols as a result. During my brief testing, this tool is
faster than genksyms for normal C code.

> I'd love to see how long it will take to process vmlinux.o

It's obviously going to be quite slow, my defconfig vmlinux.o has
14k exported symbols:

 Performance counter stats for './tools/gendwarfksyms/gendwarfksyms vmlinux.o':

371,527.67 msec task-clock:u #1.000 CPUs 
utilized
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
   231,554  page-faults:u#  623.248 /sec
 cycles:u
 instructions:u
 branches:u
 branch-misses:u

 371.686151684 seconds time elapsed

 370.534637000 seconds user
   0.987825000 seconds sys

The tool is currently single-threaded, so if we really want to go this
route, it could probably be made a bit faster.

> And this occurs even when a single source file is changed
> and vmlinux.o is re-linked.

I suppose anyone using LTO already knows it won't be a quick rebuild
though.

Sami

Re: [PATCH 13/15] modpost: Add support for hashing long symbol names

2024-06-18 Thread Sami Tolvanen

On Wed, Jun 19, 2024 at 01:47:13AM +0900, Masahiro Yamada wrote:
> On Tue, Jun 18, 2024 at 2:58 AM Sami Tolvanen  wrote:
> >
> > +#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
> 
> 
> 
> 
> I know this is a copy-paste of the kernel space code,
> but is barrier_data() also necessary for host programs?

I tested this with Clang and it worked fine without the barrier, but
with gcc, the code no longer produces correct values. Presumably this is
a load bearing data barrier.

Sami

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Sami Tolvanen

Hi Masahiro,

On Wed, Jun 19, 2024 at 01:28:21AM +0900, Masahiro Yamada wrote:
> I am surprised at someone who attempts to add another variant of genksyms.

The options are rather limited if we want Rust modules that are
compatible with modversions. We either come up with a way to version
Rust symbols or we bypass version checks for them, which is basically
what Matt's earlier patch set did:

https://lore.kernel.org/rust-for-linux/20231118025748.2778044-1-mmau...@google.com/

If there are better solutions, I would be happy to hear them.

> I am also surprised at the tool being added under the tools/ directory.

I picked this location because that's where objtool lives, and wasn't
aware that this was frowned upon. I'm fine with moving the tool
elsewhere too.

Sami

Re: [PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module

2024-06-18 Thread Luis Chamberlain

On Thu, Jun 06, 2024 at 03:31:49PM +0200, Daniel v. Kirschten wrote:
> If a module is being loaded, and the .gnu.linkonce.this_module section
> in the module's ELF file does not have the WRITE flag, the kernel will
> map the finished module struct of that module as read-only.
> This causes a kernel panic when the struct is written to the first time
> after it has been marked read-only. Currently this happens in
> complete_formation in kernel/module/main.c:2765 when the module's state is
> set to MODULE_STATE_COMING, just after setting up the memory protections.

How did you find this issue?

> Down the line, this seems to lead to unpredictable freezes when trying to
> load other modules - I guess this is due to some structures not being
> cleaned up properly, but I didn't investigate this further.
> 
> A check already exists which verifies that .gnu.linkonce.this_module
> is ALLOC. This patch simply adds an analogous check for WRITE.

Can you check to ensure our modules generated have a respective check to
ensure this check exists at build time? That would proactively inform
userspace when a built module is not built correctly, and the tool
responsible can be identified.

  Luis

Re: [PATCH] filemap: add trace events for get_pages, map_pages, and fault

2024-06-18 Thread Steven Rostedt

On Tue, 18 Jun 2024 09:36:56 +
Takaya Saeki  wrote:

> To allow precise tracking of page caches accessed, add new tracepoints
> that trigger when a process actually accesses them.
> 
> The ureadahead program used by ChromeOS traces the disk access of
> programs as they start up at boot up. It uses mincore(2) or the
> 'mm_filemap_add_to_page_cache' trace event to accomplish this. It stores
> this information in a "pack" file and on subsequent boots, it will read
> the pack file and call readahead(2) on the information so that disk
> storage can be loaded into RAM before the applications actually need it.
> 
> A problem we see is that due to the kernel's readahead algorithm that
> can aggressively pull in more data than needed (to try and accomplish
> the same goal) and this data is also recorded. The end result is that
> the pack file contains a lot of pages on disk that are never actually
> used. Calling readahead(2) on these unused pages can slow down the
> system boot up times.
> 
> To solve this, add 3 new trace events, get_pages, map_pages, and fault.
> These will be used to trace the pages are not only pulled in from disk,
> but are actually used by the application. Only those pages will be
> stored in the pack file, and this helps out the performance of boot up.
> 
> With the combination of these 3 new trace events and
> mm_filemap_add_to_page_cache, we observed a reduction in the pack file
> by 7.3% - 20% on ChromeOS varying by device.
> 
> Signed-off-by: Takaya Saeki 

Thanks Takaya.

I just applied this and ran:

 # trace-cmd start -e filemap

did a less and then:

 # trace-cmd show | grep less-901 | grep 13f730
less-901 [005] .72.531607: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x113bad ofs=0 order=0
less-901 [005] .72.531613: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x113bcc ofs=4096 order=0
less-901 [005] .72.531617: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x113b19 ofs=8192 order=0
less-901 [005] .72.531620: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x114970 ofs=12288 
order=0
less-901 [005] .72.532039: mm_filemap_get_pages: dev 
253:3 ino 13f730 ofs=0 max_ofs=4096
less-901 [005] .72.532104: mm_filemap_get_pages: dev 
253:3 ino 13f730 ofs=0 max_ofs=4096
less-901 [005] .72.532107: mm_filemap_get_pages: dev 
253:3 ino 13f730 ofs=0 max_ofs=4096
less-901 [005] .72.532497: mm_filemap_fault: dev 253:3 
ino 13f730 ofs=196608
less-901 [005] .72.532503: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x111232 ofs=131072 
order=0
less-901 [005] .72.532505: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x110099 ofs=135168 
order=0
less-901 [005] .72.532506: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x11496b ofs=139264 
order=0
less-901 [005] .72.532509: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x10b061 ofs=143360 
order=0
less-901 [005] .72.532511: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x111230 ofs=147456 
order=0
less-901 [005] .72.532512: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x110c88 ofs=151552 
order=0
less-901 [005] .72.532515: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x1119ef ofs=155648 
order=0
less-901 [005] .72.532518: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x111b49 ofs=159744 
order=0
less-901 [005] .72.532519: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x11498b ofs=163840 
order=0
less-901 [005] .72.532521: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x1149ab ofs=167936 
order=0
less-901 [005] .72.532523: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x113bcb ofs=172032 
order=0
less-901 [005] .72.532525: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x1119f3 ofs=176128 
order=0
less-901 [005] .72.532527: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x1171d0 ofs=180224 
order=0
less-901 [005] .72.532529: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x112ae7 ofs=184320 
order=0
less-901 [005] .72.532531: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x113bc0 ofs=188416 
order=0
less-901 [005] .72.532534: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x111ad8 ofs=192512 
order=0
less-901 [005] .72.532535: 
mm_filemap_add_to_page_cache: dev 253:3 ino 13f730 pfn=0x1149b6 ofs=196608 
order=0

I'm guessing the above is the kernel "readahead" kicking

Re: [PATCH v3] module: create weak dependecies

2024-06-18 Thread Luis Chamberlain

On Tue, May 14, 2024 at 09:25:55AM -0500, Lucas De Marchi wrote:
> On Fri, May 10, 2024 at 10:57:22AM GMT, Jose Ignacio Tornos Martinez wrote:
> > It has been seen that for some network mac drivers (i.e. lan78xx) the
> > related module for the phy is loaded dynamically depending on the current
> > hardware. In this case, the associated phy is read using mdio bus and then
> > the associated phy module is loaded during runtime (kernel function
> > phy_request_driver_module). However, no software dependency is defined, so
> > the user tools will no be able to get this dependency. For example, if
> > dracut is used and the hardware is present, lan78xx will be included but no
> > phy module will be added, and in the next restart the device will not work
> > from boot because no related phy will be found during initramfs stage.
> > 
> > In order to solve this, we could define a normal 'pre' software dependency
> > in lan78xx module with all the possible phy modules (there may be some),
> > but proceeding in that way, all the possible phy modules would be loaded
> > while only one is necessary.
> > 
> > The idea is to create a new type of dependency, that we are going to call
> > 'weak' to be used only by the user tools that need to detect this situation.
> > In that way, for example, dracut could check the 'weak' dependency of the
> > modules involved in order to install these dependencies in initramfs too.
> > That is, for the commented lan78xx module, defining the 'weak' dependency
> > with the possible phy modules list, only the necessary phy would be loaded
> > on demand keeping the same behavior, but all the possible phy modules would
> > be available from initramfs.
> > 
> > The 'weak' dependency support has been included in kmod:
> > https://github.com/kmod-project/kmod/commit/05828b4a6e9327a63ef94df544a042b5e9ce4fe7
> > But, take into account that this can only be used if depmod is new enough.
> > If it isn't, depmod will have the same behavior as always (keeping backward
> > compatibility) and the information for the 'weak' dependency will not be
> > provided.
> > 
> > Signed-off-by: Jose Ignacio Tornos Martinez 
> 
> 
> Reviewed-by: Lucas De Marchi 

Thanks! Applied and pushed to modules-next.

  Luis

[PATCH] uprobe: Do not use UPROBE_SWBP_INSN as static initializer

2024-06-18 Thread Jiri Olsa

Nathan reported compilation fail for loongarch arch:

  kernel/events/uprobes.c: In function 'arch_uprobe_trampoline':
  arch/loongarch/include/asm/uprobes.h:12:33: error: initializer element is not 
constant
 12 | #define UPROBE_SWBP_INSNlarch_insn_gen_break(BRK_UPROBE_BP)
| ^~~~
  kernel/events/uprobes.c:1479:39: note: in expansion of macro 
'UPROBE_SWBP_INSN'
   1479 | static uprobe_opcode_t insn = UPROBE_SWBP_INSN;

Loongarch defines UPROBE_SWBP_INSN as function call, so we can't
use it to initialize static variable.

Cc: Oleg Nesterov 
Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe")
Reported-by: Nathan Chancellor 
Signed-off-by: Jiri Olsa 
---
 kernel/events/uprobes.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2816e65729ac..6986bd993702 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1476,8 +1476,9 @@ static int xol_add_vma(struct mm_struct *mm, struct 
xol_area *area)
 
 void * __weak arch_uprobe_trampoline(unsigned long *psize)
 {
-   static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
+   static uprobe_opcode_t insn;
 
+   insn = insn ?: UPROBE_SWBP_INSN;
*psize = UPROBE_SWBP_INSN_SIZE;
return 
 }
-- 
2.45.1

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Luis Chamberlain

On Mon, Jun 17, 2024 at 05:58:19PM +, Sami Tolvanen wrote:
> Hi folks,
> 
> This series implements CONFIG_MODVERSIONS for Rust, an important
> feature for distributions like Android that want to ship Rust
> kernel modules, and depend on modversions to help ensure module ABI
> compatibility.
> 
> There have been earlier proposals [1][2] that would allow Rust
> modules to coexist with modversions, but none that actually implement
> symbol versioning. Unlike C, Rust source code doesn't have sufficient
> information about the final ABI, as the compiler has considerable
> freedom in adjusting structure layout for improved performance [3],
> for example, which makes using a source code parser like genksyms
> a non-starter. Based on Matt's suggestion and previous feedback
> from maintainers, this series uses DWARF debugging information for
> computing versions. DWARF is an established and relatively stable
> format, which includes all the necessary ABI details, and adding a
> CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a
> reasonable trade-off.

OK sure.

> The first 12 patches of this series add a small tool for computing
> symbol versions from DWARF, called gendwarfksyms. When passed a list
> of exported symbols, the tool generates an expanded type string
> for each symbol, and computes symbol CRCs similarly to genksyms.

So this is too word centric Rust, let's think about this generically.
We still ahve a symbol limitation even in the C world then, and this
solution can solve that problem also for other reasons for *whatever*
reason we devise to-come-up-with-in-the-future for augmenting symbols.
Today Rust, tomorrow, who knows.

> gendwarfksyms is written in C and uses libdw to process DWARF, mainly
> because of the existing support for C host tools that use elfutils
> (e.g., objtool).

I agree with Masahiro, that testing this with vmlinux would be eye
opening to what challenges we really have ahead. So, to help with this
let's try to re-think about this  from another perspective.

Yes, old userspace should not break, but you can add yet another option
to let you opt-in to a new world order of how these crc are mapped to
hashed repersentations of the symbols. This would allow us to:

a) Ensure correctness for all users / tools, so that proper plumbing is
   really done. By considering all symbols you increase your scope of
   awareness of anything that could really break.

b) Remove the "Rust" nature about this

c) Rust modules just becomes a *user* of this approach

It gets me wondering, given Kris is also working on allowing traces to
map symbols to module names, how does this fit into that world [0]?

As for a) the reason I'm thinking about having the ability to test a
full real kernel and moules with this is, without that, how are you sure
you have the full scope of the changes needed?

[0] https://lkml.kernel.org/r/20240614171428.968174-3-kris.van.h...@oracle.com

  Luis

Re: [PATCH] bpf/selftests: Fix __NR_uretprobe in uprobe_syscall test

2024-06-18 Thread Jiri Olsa

On Sun, Jun 16, 2024 at 09:51:18PM +0200, Jiri Olsa wrote:
> On Sun, Jun 16, 2024 at 01:19:11AM +0900, Masami Hiramatsu wrote:
> > On Sun, 16 Jun 2024 00:19:20 +0900
> > Masami Hiramatsu (Google)  wrote:
> > 
> > > On Fri, 14 Jun 2024 12:15:09 +0200
> > > Jiri Olsa  wrote:
> > > 
> > > > Fixing the __NR_uretprobe number in uprobe_syscall test,
> > > > because it changed due to merge conflict.
> > > > 
> > > 
> > > Ah, it is not enough, since Stephen's change is just a temporary fix on
> > > next tree. OK, Let me update it.
> > 
> > Hm, I thought I need to change all NR_uretprobe, but it makes NR_syscalls
> > list sparse. This may need to be solved on linus tree in merge window,
> > or I should merge (or rebase on) vfs-brauner tree before sending
> > probes/for-next.
> > 
> > Steve, do you have any idea? we talked about conflict on next tree[0].
> > 
> > [0] https://lore.kernel.org/all/20240613114243.2a500...@canb.auug.org.au/
> 
> hi,
> I have one more fix to send [1] for this, please let me know which tree
> I should based that on

hi,
any news on this?

thanks,
jirka

> 
> thanks,
> jirka
> 
> 
> [1] https://lore.kernel.org/bpf/ZmyZgzqsowkGyqmH@krava/
> 
> > 
> > Thanks,
> > 
> > > 
> > > Thanks,
> > > 
> > > > Signed-off-by: Jiri Olsa 
> > > > ---
> > > >  tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c 
> > > > b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > > > index c8517c8f5313..bd8c75b620c2 100644
> > > > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > > > @@ -216,7 +216,7 @@ static void test_uretprobe_regs_change(void)
> > > >  }
> > > >  
> > > >  #ifndef __NR_uretprobe
> > > > -#define __NR_uretprobe 463
> > > > +#define __NR_uretprobe 467
> > > >  #endif
> > > >  
> > > >  __naked unsigned long uretprobe_syscall_call_1(void)
> > > > -- 
> > > > 2.45.1
> > > > 
> > > 
> > > 
> > > -- 
> > > Masami Hiramatsu (Google) 
> > 
> > 
> > -- 
> > Masami Hiramatsu (Google)

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Masahiro Yamada

On Wed, Jun 19, 2024 at 2:18 AM Greg Kroah-Hartman
 wrote:
>
> On Wed, Jun 19, 2024 at 01:50:36AM +0900, Masahiro Yamada wrote:
> > On Wed, Jun 19, 2024 at 1:44 AM Greg Kroah-Hartman
> >  wrote:
> > >
> > > On Mon, Jun 17, 2024 at 05:58:19PM +, Sami Tolvanen wrote:
> > > > Hi folks,
> > > >
> > > > This series implements CONFIG_MODVERSIONS for Rust, an important
> > > > feature for distributions like Android that want to ship Rust
> > > > kernel modules, and depend on modversions to help ensure module ABI
> > > > compatibility.
> > > >
> > > > There have been earlier proposals [1][2] that would allow Rust
> > > > modules to coexist with modversions, but none that actually implement
> > > > symbol versioning. Unlike C, Rust source code doesn't have sufficient
> > > > information about the final ABI, as the compiler has considerable
> > > > freedom in adjusting structure layout for improved performance [3],
> > > > for example, which makes using a source code parser like genksyms
> > > > a non-starter. Based on Matt's suggestion and previous feedback
> > > > from maintainers, this series uses DWARF debugging information for
> > > > computing versions. DWARF is an established and relatively stable
> > > > format, which includes all the necessary ABI details, and adding a
> > > > CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a
> > > > reasonable trade-off.
> > > >
> > > > The first 12 patches of this series add a small tool for computing
> > > > symbol versions from DWARF, called gendwarfksyms. When passed a list
> > > > of exported symbols, the tool generates an expanded type string
> > > > for each symbol, and computes symbol CRCs similarly to genksyms.
> > > > gendwarfksyms is written in C and uses libdw to process DWARF, mainly
> > > > because of the existing support for C host tools that use elfutils
> > > > (e.g., objtool).
> > >
> > > That's cool, can the C code be switched to also use this?  That way we
> > > only have one path/code for all of this?
> >
> >
> > As the description says, it requires CONFIG_DEBUG_INFO.
> > We can strip the debug info from the final vmlinux, but
> > I guess the build speed will be even slower than the current genksyms.
>
> For people who want genksyms (i.e. distros), don't they normally already
> enable DEBUG_INFO as well?  The problems of genksyms are well known and
> a pain (I speak from experience), so replacing it with info based on
> DWARF would be great, I'll gladly trade off the DEBUG_INFO issue for
> stablilty!
>
> thanks,
>
> greg k-h
>



I do not think gendwarfksyms is a drop-in replacement,
because it relies on libelf and libdw, which will not
work with LLVM bitcode when CONFIG_LTO_CLANG=y.

His "Let's postpone this until final linking" stuff will
come back?
Then, vmlinux.o is processed to extract the CRC
of all symbols?


In my benchmark, this tool took 3.84 sec just for processing
a single rust/core.o object.

I'd love to see how long it will take to process vmlinux.o

And this occurs even when a single source file is changed
and vmlinux.o is re-linked.




--
Best Regards
Masahiro Yamada

Re: [PATCH v4 2/3] kbuild, kconfig: generate offset range data for builtin modules

2024-06-18 Thread Luis Chamberlain

On Fri, Jun 14, 2024 at 01:14:27PM -0400, Kris Van Hees wrote:
> The offset range data for builtin modules is generated using:
>  - modules.builtin: associates object files with module names
>  - vmlinux.map: provides load order of sections and offset of first member
> per section
>  - vmlinux.o.map: provides offset of object file content per section
>  - .*.cmd: build cmd file with KBUILD_MODFILE and KBUILD_MODNAME

What tests do we have to ensure this is working correctly and not
spewing out lies? What proactive mechanisms do we have to verify the
semantics won't change, or to warn at build time that this awk script
will break upon new changes? Is this just best effort? Is that good
enough? Why?

  Luis

Re: [PATCH] module: Add log information for loading module failures

2024-06-18 Thread Luis Chamberlain

On Fri, Jun 14, 2024 at 09:25:19AM +, Yusong Gao wrote:
> Add log information in kernel-space when loading module failures.
> Try to load the unsigned module and the module with bad signature
> when set 1 to /sys/module/module/parameters/sig_enforce.
> 
> Unsigned module case:
> (linux) insmod unsigned.ko
> [   18.714661] Loading of unsigned module is rejected
> insmod: can't insert 'unsigned.ko': Key was rejected by service
> (linux)
> 
> Bad signature module case:
> (linux) insmod bad_signature.ko
> insmod: can't insert 'bad_signature.ko': Key was rejected by service
> (linux)
> 
> There have different logging behavior the bad signature case only log
> in user-space, add log info for fatal errors in module_sig_check().
> 
> Signed-off-by: Yusong Gao 
> ---
>  kernel/module/signing.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/module/signing.c b/kernel/module/signing.c
> index a2ff4242e623..6a6493c8f7e4 100644
> --- a/kernel/module/signing.c
> +++ b/kernel/module/signing.c
> @@ -113,6 +113,7 @@ int module_sig_check(struct load_info *info, int flags)
>* unparseable signatures, and signature check failures --
>* even if signatures aren't required.
>*/
> + pr_notice("Loading module failed (errno=%d)\n", -err);
>   return err;

I welcome pr_debug() messages but if we were to add a regular print for every
single type of failure we'd clutter the code, we don't want that.

  Luis

Re: [PATCH v5] remoteproc: xlnx: add attach detach support

2024-06-18 Thread Tanmay Shah




On 6/18/24 11:55 AM, Mathieu Poirier wrote:
> On Tue, Jun 18, 2024 at 11:45:28AM -0500, Tanmay Shah wrote:
>> 
>> 
>> On 6/17/24 10:40 AM, Mathieu Poirier wrote:
>> > Good day,
>> > 
>> > On Mon, Jun 10, 2024 at 08:42:27AM -0700, Tanmay Shah wrote:
>> >> It is possible that remote processor is already running before
>> >> linux boot or remoteproc platform driver probe. Implement required
>> >> remoteproc framework ops to provide resource table address and
>> >> connect or disconnect with remote processor in such case.
>> >> 
>> >> Signed-off-by: Tanmay Shah 
>> >> ---
>> >> 
>> >> Changes in v5:
>> >>   - Fix comment on assigning DETACHED state to remoteproc instance
>> >> during driver probe.
>> >>   - Fix patch subject and remove "drivers"
>> >> 
>> >> Changes in v4:
>> >>   - Move change log out of commit text
>> >> 
>> >> Changes in v3:
>> >>   - Drop SRAM patch from the series
>> >>   - Change type from "struct resource_table *" to void __iomem *
>> >>   - Change comment format from /** to /*
>> >>   - Remove unmap of resource table va address during detach, allowing
>> >> attach-detach-reattach use case.
>> >>   - Unmap rsc_data_va after retrieving resource table data structure.
>> >>   - Unmap resource table va during driver remove op
>> >> 
>> >> Changes in v2:
>> >>   - Fix typecast warnings reported using sparse tool.
>> >>   - Fix following sparse warnings:
>> >> 
>> >>  drivers/remoteproc/xlnx_r5_remoteproc.c | 173 +++-
>> >>  1 file changed, 169 insertions(+), 4 deletions(-)
>> >> 
>> >> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
>> >> b/drivers/remoteproc/xlnx_r5_remoteproc.c
>> >> index 84243d1dff9f..6ddce5650f95 100644
>> >> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
>> >> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
>> >> @@ -25,6 +25,10 @@
>> >>  /* RX mailbox client buffer max length */
>> >>  #define MBOX_CLIENT_BUF_MAX  (IPI_BUF_LEN_MAX + \
>> >>sizeof(struct zynqmp_ipi_message))
>> >> +
>> >> +#define RSC_TBL_XLNX_MAGIC   ((uint32_t)'x' << 24 | (uint32_t)'a' << 
>> >> 16 | \
>> >> +  (uint32_t)'m' << 8 | (uint32_t)'p')
>> >> +
>> >>  /*
>> >>   * settings for RPU cluster mode which
>> >>   * reflects possible values of xlnx,cluster-mode dt-property
>> >> @@ -73,6 +77,26 @@ struct mbox_info {
>> >>   struct mbox_chan *rx_chan;
>> >>  };
>> >>  
>> >> +/**
>> >> + * struct rsc_tbl_data
>> >> + *
>> >> + * Platform specific data structure used to sync resource table address.
>> >> + * It's important to maintain order and size of each field on remote 
>> >> side.
>> >> + *
>> >> + * @version: version of data structure
>> >> + * @magic_num: 32-bit magic number.
>> >> + * @comp_magic_num: complement of above magic number
>> >> + * @rsc_tbl_size: resource table size
>> >> + * @rsc_tbl: resource table address
>> >> + */
>> >> +struct rsc_tbl_data {
>> >> + const int version;
>> >> + const u32 magic_num;
>> >> + const u32 comp_magic_num;
>> >> + const u32 rsc_tbl_size;
>> >> + const uintptr_t rsc_tbl;
>> >> +} __packed;
>> >> +
>> >>  /*
>> >>   * Hardcoded TCM bank values. This will stay in driver to maintain 
>> >> backward
>> >>   * compatibility with device-tree that does not have TCM information.
>> >> @@ -95,20 +119,24 @@ static const struct mem_bank_data 
>> >> zynqmp_tcm_banks_lockstep[] = {
>> >>  /**
>> >>   * struct zynqmp_r5_core
>> >>   *
>> >> + * @rsc_tbl_va: resource table virtual address
>> >>   * @dev: device of RPU instance
>> >>   * @np: device node of RPU instance
>> >>   * @tcm_bank_count: number TCM banks accessible to this RPU
>> >>   * @tcm_banks: array of each TCM bank data
>> >>   * @rproc: rproc handle
>> >> + * @rsc_tbl_size: resource table size retrieved from remote
>> >>   * @pm_domain_id: RPU CPU power domain id
>> >>   * @ipi: pointer to mailbox information
>> >>   */
>> >>  struct zynqmp_r5_core {
>> >> + void __iomem *rsc_tbl_va;
>> >>   struct device *dev;
>> >>   struct device_node *np;
>> >>   int tcm_bank_count;
>> >>   struct mem_bank_data **tcm_banks;
>> >>   struct rproc *rproc;
>> >> + u32 rsc_tbl_size;
>> >>   u32 pm_domain_id;
>> >>   struct mbox_info *ipi;
>> >>  };
>> >> @@ -621,10 +649,19 @@ static int zynqmp_r5_rproc_prepare(struct rproc 
>> >> *rproc)
>> >>  {
>> >>   int ret;
>> >>  
>> >> - ret = add_tcm_banks(rproc);
>> >> - if (ret) {
>> >> - dev_err(>dev, "failed to get TCM banks, err %d\n", ret);
>> >> - return ret;
>> >> + /*
>> >> +  * For attach/detach use case, Firmware is already loaded so
>> >> +  * TCM isn't really needed at all. Also, for security TCM can be
>> >> +  * locked in such case and linux may not have access at all.
>> >> +  * So avoid adding TCM banks. TCM power-domains requested during attach
>> >> +  * callback.
>> >> +  */
>> >> + if (rproc->state != RPROC_DETACHED) {
>> >> + ret = add_tcm_banks(rproc);
>> >> + if (ret) {
>> >> + dev_err(>dev, "failed to get TCM

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Greg Kroah-Hartman

On Wed, Jun 19, 2024 at 01:50:36AM +0900, Masahiro Yamada wrote:
> On Wed, Jun 19, 2024 at 1:44 AM Greg Kroah-Hartman
>  wrote:
> >
> > On Mon, Jun 17, 2024 at 05:58:19PM +, Sami Tolvanen wrote:
> > > Hi folks,
> > >
> > > This series implements CONFIG_MODVERSIONS for Rust, an important
> > > feature for distributions like Android that want to ship Rust
> > > kernel modules, and depend on modversions to help ensure module ABI
> > > compatibility.
> > >
> > > There have been earlier proposals [1][2] that would allow Rust
> > > modules to coexist with modversions, but none that actually implement
> > > symbol versioning. Unlike C, Rust source code doesn't have sufficient
> > > information about the final ABI, as the compiler has considerable
> > > freedom in adjusting structure layout for improved performance [3],
> > > for example, which makes using a source code parser like genksyms
> > > a non-starter. Based on Matt's suggestion and previous feedback
> > > from maintainers, this series uses DWARF debugging information for
> > > computing versions. DWARF is an established and relatively stable
> > > format, which includes all the necessary ABI details, and adding a
> > > CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a
> > > reasonable trade-off.
> > >
> > > The first 12 patches of this series add a small tool for computing
> > > symbol versions from DWARF, called gendwarfksyms. When passed a list
> > > of exported symbols, the tool generates an expanded type string
> > > for each symbol, and computes symbol CRCs similarly to genksyms.
> > > gendwarfksyms is written in C and uses libdw to process DWARF, mainly
> > > because of the existing support for C host tools that use elfutils
> > > (e.g., objtool).
> >
> > That's cool, can the C code be switched to also use this?  That way we
> > only have one path/code for all of this?
> 
> 
> As the description says, it requires CONFIG_DEBUG_INFO.
> We can strip the debug info from the final vmlinux, but
> I guess the build speed will be even slower than the current genksyms.

For people who want genksyms (i.e. distros), don't they normally already
enable DEBUG_INFO as well?  The problems of genksyms are well known and
a pain (I speak from experience), so replacing it with info based on
DWARF would be great, I'll gladly trade off the DEBUG_INFO issue for
stablilty!

thanks,

greg k-h

Re: [PATCH v10 2/8] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-06-18 Thread Andrew Davis


On 6/18/24 12:05 PM, Mathieu Poirier wrote:

Good morning,

On Mon, Jun 10, 2024 at 01:06:09PM -0500, Andrew Davis wrote:

From: Martyn Welch 

The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
the MCU domain. This core is typically used for safety applications in a
stand alone mode. However, some application (non safety related) may
want to use the M4F core as a generic remote processor with IPC to the
host processor. The M4F core has internal IRAM and DRAM memories and are
exposed to the system bus for code and data loading.

A remote processor driver is added to support this subsystem, including
being able to load and boot the M4F core. Loading includes to M4F
internal memories and predefined external code/data memories. The
carve outs for external contiguous memory is defined in the M4F device
node and should match with the external memory declarations in the M4F
image binary. The M4F subsystem has two resets. One reset is for the
entire subsystem i.e including the internal memories and the other, a
local reset is only for the M4F processing core. When loading the image,
the driver first releases the subsystem reset, loads the firmware image
and then releases the local reset to let the M4F processing core run.

Signed-off-by: Martyn Welch 
Signed-off-by: Hari Nagalla 
Signed-off-by: Andrew Davis 
---
  drivers/remoteproc/Kconfig   |  13 +
  drivers/remoteproc/Makefile  |   1 +
  drivers/remoteproc/ti_k3_m4_remoteproc.c | 760 +++
  3 files changed, 774 insertions(+)
  create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index 48845dc8fa852..1a7c0330c91a9 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -339,6 +339,19 @@ config TI_K3_DSP_REMOTEPROC
  It's safe to say N here if you're not interested in utilizing
  the DSP slave processors.
  
+config TI_K3_M4_REMOTEPROC

+   tristate "TI K3 M4 remoteproc support"
+   depends on ARCH_K3 || COMPILE_TEST
+   select MAILBOX
+   select OMAP2PLUS_MBOX
+   help
+ Say m here to support TI's M4 remote processor subsystems
+ on various TI K3 family of SoCs through the remote processor
+ framework.
+
+ It's safe to say N here if you're not interested in utilizing
+ a remote processor.
+
  config TI_K3_R5_REMOTEPROC
tristate "TI K3 R5 remoteproc support"
depends on ARCH_K3
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index 91314a9b43cef..5ff4e2fee4abd 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC)   += st_remoteproc.o
  obj-$(CONFIG_ST_SLIM_REMOTEPROC)  += st_slim_rproc.o
  obj-$(CONFIG_STM32_RPROC) += stm32_rproc.o
  obj-$(CONFIG_TI_K3_DSP_REMOTEPROC)+= ti_k3_dsp_remoteproc.o
+obj-$(CONFIG_TI_K3_M4_REMOTEPROC)  += ti_k3_m4_remoteproc.o
  obj-$(CONFIG_TI_K3_R5_REMOTEPROC) += ti_k3_r5_remoteproc.o
  obj-$(CONFIG_XLNX_R5_REMOTEPROC)  += xlnx_r5_remoteproc.o
diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
b/drivers/remoteproc/ti_k3_m4_remoteproc.c
new file mode 100644
index 0..880e8f0ad1ba3
--- /dev/null
+++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
@@ -0,0 +1,760 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * TI K3 Cortex-M4 Remote Processor(s) driver
+ *
+ * Copyright (C) 2021-2024 Texas Instruments Incorporated - https://www.ti.com/
+ * Hari Nagalla 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "omap_remoteproc.h"
+#include "remoteproc_internal.h"
+#include "ti_sci_proc.h"
+
+/**
+ * struct k3_m4_rproc_mem - internal memory structure
+ * @cpu_addr: MPU virtual address of the memory region
+ * @bus_addr: Bus address used to access the memory region
+ * @dev_addr: Device address of the memory region from remote processor view
+ * @size: Size of the memory region
+ */
+struct k3_m4_rproc_mem {
+   void __iomem *cpu_addr;
+   phys_addr_t bus_addr;
+   u32 dev_addr;
+   size_t size;
+};
+
+/**
+ * struct k3_m4_rproc_mem_data - memory definitions for a remote processor
+ * @name: name for this memory entry
+ * @dev_addr: device address for the memory entry
+ */
+struct k3_m4_rproc_mem_data {
+   const char *name;
+   const u32 dev_addr;
+};
+
+/**
+ * struct k3_m4_rproc_dev_data - device data structure for a remote processor
+ * @mems: pointer to memory definitions for a remote processor
+ * @num_mems: number of memory regions in @mems
+ */
+struct k3_m4_rproc_dev_data {
+   const struct k3_m4_rproc_mem_data *mems;
+   u32 num_mems;
+};
+
+/**
+ * struct k3_m4_rproc - k3 remote processor driver structure
+ * @dev: cached device pointer
+ * @rproc: remoteproc device handle
+ * @mem: internal memory regions data
+ * @num_mems: number of internal

Re: [PATCH v10 2/8] remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem

2024-06-18 Thread Mathieu Poirier

Good morning,

On Mon, Jun 10, 2024 at 01:06:09PM -0500, Andrew Davis wrote:
> From: Martyn Welch 
> 
> The AM62x and AM64x SoCs of the TI K3 family has a Cortex M4F core in
> the MCU domain. This core is typically used for safety applications in a
> stand alone mode. However, some application (non safety related) may
> want to use the M4F core as a generic remote processor with IPC to the
> host processor. The M4F core has internal IRAM and DRAM memories and are
> exposed to the system bus for code and data loading.
> 
> A remote processor driver is added to support this subsystem, including
> being able to load and boot the M4F core. Loading includes to M4F
> internal memories and predefined external code/data memories. The
> carve outs for external contiguous memory is defined in the M4F device
> node and should match with the external memory declarations in the M4F
> image binary. The M4F subsystem has two resets. One reset is for the
> entire subsystem i.e including the internal memories and the other, a
> local reset is only for the M4F processing core. When loading the image,
> the driver first releases the subsystem reset, loads the firmware image
> and then releases the local reset to let the M4F processing core run.
> 
> Signed-off-by: Martyn Welch 
> Signed-off-by: Hari Nagalla 
> Signed-off-by: Andrew Davis 
> ---
>  drivers/remoteproc/Kconfig   |  13 +
>  drivers/remoteproc/Makefile  |   1 +
>  drivers/remoteproc/ti_k3_m4_remoteproc.c | 760 +++
>  3 files changed, 774 insertions(+)
>  create mode 100644 drivers/remoteproc/ti_k3_m4_remoteproc.c
> 
> diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
> index 48845dc8fa852..1a7c0330c91a9 100644
> --- a/drivers/remoteproc/Kconfig
> +++ b/drivers/remoteproc/Kconfig
> @@ -339,6 +339,19 @@ config TI_K3_DSP_REMOTEPROC
> It's safe to say N here if you're not interested in utilizing
> the DSP slave processors.
>  
> +config TI_K3_M4_REMOTEPROC
> + tristate "TI K3 M4 remoteproc support"
> + depends on ARCH_K3 || COMPILE_TEST
> + select MAILBOX
> + select OMAP2PLUS_MBOX
> + help
> +   Say m here to support TI's M4 remote processor subsystems
> +   on various TI K3 family of SoCs through the remote processor
> +   framework.
> +
> +   It's safe to say N here if you're not interested in utilizing
> +   a remote processor.
> +
>  config TI_K3_R5_REMOTEPROC
>   tristate "TI K3 R5 remoteproc support"
>   depends on ARCH_K3
> diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
> index 91314a9b43cef..5ff4e2fee4abd 100644
> --- a/drivers/remoteproc/Makefile
> +++ b/drivers/remoteproc/Makefile
> @@ -37,5 +37,6 @@ obj-$(CONFIG_ST_REMOTEPROC) += st_remoteproc.o
>  obj-$(CONFIG_ST_SLIM_REMOTEPROC) += st_slim_rproc.o
>  obj-$(CONFIG_STM32_RPROC)+= stm32_rproc.o
>  obj-$(CONFIG_TI_K3_DSP_REMOTEPROC)   += ti_k3_dsp_remoteproc.o
> +obj-$(CONFIG_TI_K3_M4_REMOTEPROC)+= ti_k3_m4_remoteproc.o
>  obj-$(CONFIG_TI_K3_R5_REMOTEPROC)+= ti_k3_r5_remoteproc.o
>  obj-$(CONFIG_XLNX_R5_REMOTEPROC) += xlnx_r5_remoteproc.o
> diff --git a/drivers/remoteproc/ti_k3_m4_remoteproc.c 
> b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> new file mode 100644
> index 0..880e8f0ad1ba3
> --- /dev/null
> +++ b/drivers/remoteproc/ti_k3_m4_remoteproc.c
> @@ -0,0 +1,760 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * TI K3 Cortex-M4 Remote Processor(s) driver
> + *
> + * Copyright (C) 2021-2024 Texas Instruments Incorporated - 
> https://www.ti.com/
> + *   Hari Nagalla 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "omap_remoteproc.h"
> +#include "remoteproc_internal.h"
> +#include "ti_sci_proc.h"
> +
> +/**
> + * struct k3_m4_rproc_mem - internal memory structure
> + * @cpu_addr: MPU virtual address of the memory region
> + * @bus_addr: Bus address used to access the memory region
> + * @dev_addr: Device address of the memory region from remote processor view
> + * @size: Size of the memory region
> + */
> +struct k3_m4_rproc_mem {
> + void __iomem *cpu_addr;
> + phys_addr_t bus_addr;
> + u32 dev_addr;
> + size_t size;
> +};
> +
> +/**
> + * struct k3_m4_rproc_mem_data - memory definitions for a remote processor
> + * @name: name for this memory entry
> + * @dev_addr: device address for the memory entry
> + */
> +struct k3_m4_rproc_mem_data {
> + const char *name;
> + const u32 dev_addr;
> +};
> +
> +/**
> + * struct k3_m4_rproc_dev_data - device data structure for a remote processor
> + * @mems: pointer to memory definitions for a remote processor
> + * @num_mems: number of memory regions in @mems
> + */
> +struct k3_m4_rproc_dev_data {
> + const struct k3_m4_rproc_mem_data *mems;
> + u32 num_mems;
> +};
> +
> +/**
> + * struct k3_m4_rproc - k3 remote processor driver

Re: [PATCH v5] remoteproc: xlnx: add attach detach support

2024-06-18 Thread Mathieu Poirier

On Tue, Jun 18, 2024 at 11:45:28AM -0500, Tanmay Shah wrote:
> 
> 
> On 6/17/24 10:40 AM, Mathieu Poirier wrote:
> > Good day,
> > 
> > On Mon, Jun 10, 2024 at 08:42:27AM -0700, Tanmay Shah wrote:
> >> It is possible that remote processor is already running before
> >> linux boot or remoteproc platform driver probe. Implement required
> >> remoteproc framework ops to provide resource table address and
> >> connect or disconnect with remote processor in such case.
> >> 
> >> Signed-off-by: Tanmay Shah 
> >> ---
> >> 
> >> Changes in v5:
> >>   - Fix comment on assigning DETACHED state to remoteproc instance
> >> during driver probe.
> >>   - Fix patch subject and remove "drivers"
> >> 
> >> Changes in v4:
> >>   - Move change log out of commit text
> >> 
> >> Changes in v3:
> >>   - Drop SRAM patch from the series
> >>   - Change type from "struct resource_table *" to void __iomem *
> >>   - Change comment format from /** to /*
> >>   - Remove unmap of resource table va address during detach, allowing
> >> attach-detach-reattach use case.
> >>   - Unmap rsc_data_va after retrieving resource table data structure.
> >>   - Unmap resource table va during driver remove op
> >> 
> >> Changes in v2:
> >>   - Fix typecast warnings reported using sparse tool.
> >>   - Fix following sparse warnings:
> >> 
> >>  drivers/remoteproc/xlnx_r5_remoteproc.c | 173 +++-
> >>  1 file changed, 169 insertions(+), 4 deletions(-)
> >> 
> >> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
> >> b/drivers/remoteproc/xlnx_r5_remoteproc.c
> >> index 84243d1dff9f..6ddce5650f95 100644
> >> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
> >> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
> >> @@ -25,6 +25,10 @@
> >>  /* RX mailbox client buffer max length */
> >>  #define MBOX_CLIENT_BUF_MAX   (IPI_BUF_LEN_MAX + \
> >> sizeof(struct zynqmp_ipi_message))
> >> +
> >> +#define RSC_TBL_XLNX_MAGIC((uint32_t)'x' << 24 | (uint32_t)'a' << 
> >> 16 | \
> >> +   (uint32_t)'m' << 8 | (uint32_t)'p')
> >> +
> >>  /*
> >>   * settings for RPU cluster mode which
> >>   * reflects possible values of xlnx,cluster-mode dt-property
> >> @@ -73,6 +77,26 @@ struct mbox_info {
> >>struct mbox_chan *rx_chan;
> >>  };
> >>  
> >> +/**
> >> + * struct rsc_tbl_data
> >> + *
> >> + * Platform specific data structure used to sync resource table address.
> >> + * It's important to maintain order and size of each field on remote side.
> >> + *
> >> + * @version: version of data structure
> >> + * @magic_num: 32-bit magic number.
> >> + * @comp_magic_num: complement of above magic number
> >> + * @rsc_tbl_size: resource table size
> >> + * @rsc_tbl: resource table address
> >> + */
> >> +struct rsc_tbl_data {
> >> +  const int version;
> >> +  const u32 magic_num;
> >> +  const u32 comp_magic_num;
> >> +  const u32 rsc_tbl_size;
> >> +  const uintptr_t rsc_tbl;
> >> +} __packed;
> >> +
> >>  /*
> >>   * Hardcoded TCM bank values. This will stay in driver to maintain 
> >> backward
> >>   * compatibility with device-tree that does not have TCM information.
> >> @@ -95,20 +119,24 @@ static const struct mem_bank_data 
> >> zynqmp_tcm_banks_lockstep[] = {
> >>  /**
> >>   * struct zynqmp_r5_core
> >>   *
> >> + * @rsc_tbl_va: resource table virtual address
> >>   * @dev: device of RPU instance
> >>   * @np: device node of RPU instance
> >>   * @tcm_bank_count: number TCM banks accessible to this RPU
> >>   * @tcm_banks: array of each TCM bank data
> >>   * @rproc: rproc handle
> >> + * @rsc_tbl_size: resource table size retrieved from remote
> >>   * @pm_domain_id: RPU CPU power domain id
> >>   * @ipi: pointer to mailbox information
> >>   */
> >>  struct zynqmp_r5_core {
> >> +  void __iomem *rsc_tbl_va;
> >>struct device *dev;
> >>struct device_node *np;
> >>int tcm_bank_count;
> >>struct mem_bank_data **tcm_banks;
> >>struct rproc *rproc;
> >> +  u32 rsc_tbl_size;
> >>u32 pm_domain_id;
> >>struct mbox_info *ipi;
> >>  };
> >> @@ -621,10 +649,19 @@ static int zynqmp_r5_rproc_prepare(struct rproc 
> >> *rproc)
> >>  {
> >>int ret;
> >>  
> >> -  ret = add_tcm_banks(rproc);
> >> -  if (ret) {
> >> -  dev_err(>dev, "failed to get TCM banks, err %d\n", ret);
> >> -  return ret;
> >> +  /*
> >> +   * For attach/detach use case, Firmware is already loaded so
> >> +   * TCM isn't really needed at all. Also, for security TCM can be
> >> +   * locked in such case and linux may not have access at all.
> >> +   * So avoid adding TCM banks. TCM power-domains requested during attach
> >> +   * callback.
> >> +   */
> >> +  if (rproc->state != RPROC_DETACHED) {
> >> +  ret = add_tcm_banks(rproc);
> >> +  if (ret) {
> >> +  dev_err(>dev, "failed to get TCM banks, err 
> >> %d\n", ret);
> >> +  return ret;
> >> +  }
> > 
> > In the normal case function add_tcm_banks() will call 
> >

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Masahiro Yamada

On Wed, Jun 19, 2024 at 1:44 AM Greg Kroah-Hartman
 wrote:
>
> On Mon, Jun 17, 2024 at 05:58:19PM +, Sami Tolvanen wrote:
> > Hi folks,
> >
> > This series implements CONFIG_MODVERSIONS for Rust, an important
> > feature for distributions like Android that want to ship Rust
> > kernel modules, and depend on modversions to help ensure module ABI
> > compatibility.
> >
> > There have been earlier proposals [1][2] that would allow Rust
> > modules to coexist with modversions, but none that actually implement
> > symbol versioning. Unlike C, Rust source code doesn't have sufficient
> > information about the final ABI, as the compiler has considerable
> > freedom in adjusting structure layout for improved performance [3],
> > for example, which makes using a source code parser like genksyms
> > a non-starter. Based on Matt's suggestion and previous feedback
> > from maintainers, this series uses DWARF debugging information for
> > computing versions. DWARF is an established and relatively stable
> > format, which includes all the necessary ABI details, and adding a
> > CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a
> > reasonable trade-off.
> >
> > The first 12 patches of this series add a small tool for computing
> > symbol versions from DWARF, called gendwarfksyms. When passed a list
> > of exported symbols, the tool generates an expanded type string
> > for each symbol, and computes symbol CRCs similarly to genksyms.
> > gendwarfksyms is written in C and uses libdw to process DWARF, mainly
> > because of the existing support for C host tools that use elfutils
> > (e.g., objtool).
>
> That's cool, can the C code be switched to also use this?  That way we
> only have one path/code for all of this?


As the description says, it requires CONFIG_DEBUG_INFO.
We can strip the debug info from the final vmlinux, but
I guess the build speed will be even slower than the current genksyms.




-- 
Best Regards
Masahiro Yamada

Re: [PATCH 13/15] modpost: Add support for hashing long symbol names

2024-06-18 Thread Masahiro Yamada

On Tue, Jun 18, 2024 at 2:58 AM Sami Tolvanen  wrote:
>
> Rust frequently has symbol names longer than MODULE_NAME_LEN, because
> the full namespace path is encoded into the mangled name. Instead of
> modpost failing when running into a long name with CONFIG_MODVERSIONS,
> store a hash of the name in struct modversion_info.
>
> To avoid breaking userspace tools that parse the __versions array,
> include a null-terminated hash function name at the beginning of the
> name string, followed by a binary hash. In order to keep .mod.c files
> more human-readable, add a comment with the full symbol name before the
> array entry. Example output in rust_minimal.mod.c:
>
>   static const struct modversion_info versions[]
>   __used __section("__versions") = {
> /* _RNvNtNtCs1cdwasc6FUb_6kernel5print14format_strings4INFO */
> { 0x9ec5442f, "sha256\x00\x56\x96\xf4\x27\xdb\x4a\xbf[...]" },
> { 0x1d6595b1, "_RNvNtCs1cdwasc6FUb_6kernel5print11call_printk" },
> { 0x3c642974, "__rust_dealloc" },
> ...
>   };
>
> modprobe output for the resulting module:
>
>   $ modprobe --dump-modversions rust_minimal.ko
>   0x9ec5442fsha256
>   0x1d6595b1_RNvNtCs1cdwasc6FUb_6kernel5print11call_printk
>   0x3c642974__rust_dealloc
>   ...
>
> While the output is less useful, the tool continues to work and can be
> later updated to produce more helpful output for hashed symbols.
>
> Note that this patch adds a generic SHA-256 implementation to modpost
> adapted from the Crypto API, but other hash functions may be used in
> future if needed.
>
> Suggested-by: Matthew Maurer 
> Signed-off-by: Sami Tolvanen 
> ---
>  scripts/mod/Makefile  |   4 +-
>  scripts/mod/modpost.c |  20 ++-
>  scripts/mod/modpost.h |  20 +++
>  scripts/mod/symhash.c | 327 ++
>  4 files changed, 364 insertions(+), 7 deletions(-)
>  create mode 100644 scripts/mod/symhash.c
>
> diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile
> index c729bc936bae..ddd59ea9027e 100644
> --- a/scripts/mod/Makefile
> +++ b/scripts/mod/Makefile
> @@ -4,7 +4,7 @@ CFLAGS_REMOVE_empty.o += $(CC_FLAGS_LTO)
>  hostprogs-always-y += modpost mk_elfconfig
>  always-y   += empty.o
>
> -modpost-objs   := modpost.o file2alias.o sumversion.o symsearch.o
> +modpost-objs   := modpost.o file2alias.o symhash.o sumversion.o symsearch.o
>
>  devicetable-offsets-file := devicetable-offsets.h
>
> @@ -15,7 +15,7 @@ targets += $(devicetable-offsets-file) devicetable-offsets.s
>
>  # dependencies on generated files need to be listed explicitly
>
> -$(obj)/modpost.o $(obj)/file2alias.o $(obj)/sumversion.o $(obj)/symsearch.o: 
> $(obj)/elfconfig.h
> +$(obj)/modpost.o $(obj)/file2alias.o $(obj)/symhash.o $(obj)/sumversion.o 
> $(obj)/symsearch.o: $(obj)/elfconfig.h
>  $(obj)/file2alias.o: $(obj)/$(devicetable-offsets-file)
>
>  quiet_cmd_elfconfig = MKELF   $@
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index f48d72d22dc2..2631e3e75a5c 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -1900,7 +1900,10 @@ static void add_exported_symbols(struct buffer *buf, 
> struct module *mod)
>   **/
>  static void add_versions(struct buffer *b, struct module *mod)
>  {
> +   char hash[SYMHASH_STR_LEN];
> struct symbol *s;
> +   const char *name;
> +   size_t len;
>
> if (!modversions)
> return;
> @@ -1917,13 +1920,20 @@ static void add_versions(struct buffer *b, struct 
> module *mod)
> s->name, mod->name);
> continue;
> }
> -   if (strlen(s->name) >= MODULE_NAME_LEN) {
> -   error("too long symbol \"%s\" [%s.ko]\n",
> - s->name, mod->name);
> -   break;
> +   len = strlen(s->name);
> +   /*
> +* For symbols with a long name, use the hash format, but 
> include
> +* the full symbol name as a comment to keep the generated 
> files
> +* human-readable.
> +*/
> +   if (len >= MODULE_NAME_LEN) {
> +   buf_printf(b, "\t/* %s */\n", s->name);
> +   name = symhash_str(s->name, len, hash);
> +   } else {
> +   name = s->name;
> }
> buf_printf(b, "\t{ %#8x, \"%s\" },\n",
> -  s->crc, s->name);
> +  s->crc, name);
> }
>
> buf_printf(b, "};\n");
> diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h
> index ee43c7950636..cd2eb718936b 100644
> --- a/scripts/mod/modpost.h
> +++ b/scripts/mod/modpost.h
> @@ -183,6 +183,26 @@ void handle_moddevtable(struct module *mod, struct 
> elf_info *info,
> Elf_Sym *sym, const char *symname);
>  void add_moddevtable(struct buffer *buf, struct module *mod);
>
> +/*

Re: [PATCH v5] remoteproc: xlnx: add attach detach support

2024-06-18 Thread Tanmay Shah




On 6/17/24 10:40 AM, Mathieu Poirier wrote:
> Good day,
> 
> On Mon, Jun 10, 2024 at 08:42:27AM -0700, Tanmay Shah wrote:
>> It is possible that remote processor is already running before
>> linux boot or remoteproc platform driver probe. Implement required
>> remoteproc framework ops to provide resource table address and
>> connect or disconnect with remote processor in such case.
>> 
>> Signed-off-by: Tanmay Shah 
>> ---
>> 
>> Changes in v5:
>>   - Fix comment on assigning DETACHED state to remoteproc instance
>> during driver probe.
>>   - Fix patch subject and remove "drivers"
>> 
>> Changes in v4:
>>   - Move change log out of commit text
>> 
>> Changes in v3:
>>   - Drop SRAM patch from the series
>>   - Change type from "struct resource_table *" to void __iomem *
>>   - Change comment format from /** to /*
>>   - Remove unmap of resource table va address during detach, allowing
>> attach-detach-reattach use case.
>>   - Unmap rsc_data_va after retrieving resource table data structure.
>>   - Unmap resource table va during driver remove op
>> 
>> Changes in v2:
>>   - Fix typecast warnings reported using sparse tool.
>>   - Fix following sparse warnings:
>> 
>>  drivers/remoteproc/xlnx_r5_remoteproc.c | 173 +++-
>>  1 file changed, 169 insertions(+), 4 deletions(-)
>> 
>> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
>> b/drivers/remoteproc/xlnx_r5_remoteproc.c
>> index 84243d1dff9f..6ddce5650f95 100644
>> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
>> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
>> @@ -25,6 +25,10 @@
>>  /* RX mailbox client buffer max length */
>>  #define MBOX_CLIENT_BUF_MAX (IPI_BUF_LEN_MAX + \
>>   sizeof(struct zynqmp_ipi_message))
>> +
>> +#define RSC_TBL_XLNX_MAGIC  ((uint32_t)'x' << 24 | (uint32_t)'a' << 16 | \
>> + (uint32_t)'m' << 8 | (uint32_t)'p')
>> +
>>  /*
>>   * settings for RPU cluster mode which
>>   * reflects possible values of xlnx,cluster-mode dt-property
>> @@ -73,6 +77,26 @@ struct mbox_info {
>>  struct mbox_chan *rx_chan;
>>  };
>>  
>> +/**
>> + * struct rsc_tbl_data
>> + *
>> + * Platform specific data structure used to sync resource table address.
>> + * It's important to maintain order and size of each field on remote side.
>> + *
>> + * @version: version of data structure
>> + * @magic_num: 32-bit magic number.
>> + * @comp_magic_num: complement of above magic number
>> + * @rsc_tbl_size: resource table size
>> + * @rsc_tbl: resource table address
>> + */
>> +struct rsc_tbl_data {
>> +const int version;
>> +const u32 magic_num;
>> +const u32 comp_magic_num;
>> +const u32 rsc_tbl_size;
>> +const uintptr_t rsc_tbl;
>> +} __packed;
>> +
>>  /*
>>   * Hardcoded TCM bank values. This will stay in driver to maintain backward
>>   * compatibility with device-tree that does not have TCM information.
>> @@ -95,20 +119,24 @@ static const struct mem_bank_data 
>> zynqmp_tcm_banks_lockstep[] = {
>>  /**
>>   * struct zynqmp_r5_core
>>   *
>> + * @rsc_tbl_va: resource table virtual address
>>   * @dev: device of RPU instance
>>   * @np: device node of RPU instance
>>   * @tcm_bank_count: number TCM banks accessible to this RPU
>>   * @tcm_banks: array of each TCM bank data
>>   * @rproc: rproc handle
>> + * @rsc_tbl_size: resource table size retrieved from remote
>>   * @pm_domain_id: RPU CPU power domain id
>>   * @ipi: pointer to mailbox information
>>   */
>>  struct zynqmp_r5_core {
>> +void __iomem *rsc_tbl_va;
>>  struct device *dev;
>>  struct device_node *np;
>>  int tcm_bank_count;
>>  struct mem_bank_data **tcm_banks;
>>  struct rproc *rproc;
>> +u32 rsc_tbl_size;
>>  u32 pm_domain_id;
>>  struct mbox_info *ipi;
>>  };
>> @@ -621,10 +649,19 @@ static int zynqmp_r5_rproc_prepare(struct rproc *rproc)
>>  {
>>  int ret;
>>  
>> -ret = add_tcm_banks(rproc);
>> -if (ret) {
>> -dev_err(>dev, "failed to get TCM banks, err %d\n", ret);
>> -return ret;
>> +/*
>> + * For attach/detach use case, Firmware is already loaded so
>> + * TCM isn't really needed at all. Also, for security TCM can be
>> + * locked in such case and linux may not have access at all.
>> + * So avoid adding TCM banks. TCM power-domains requested during attach
>> + * callback.
>> + */
>> +if (rproc->state != RPROC_DETACHED) {
>> +ret = add_tcm_banks(rproc);
>> +if (ret) {
>> +dev_err(>dev, "failed to get TCM banks, err 
>> %d\n", ret);
>> +return ret;
>> +}
> 
> In the normal case function add_tcm_banks() will call zynqmp_pm_request_node()
> but in the attach case, that gets done in zynqmp_r5_attach().  Either way,
> zynqmp_pm_release_node() is called in zynqmp_r5_rproc_unprepare().  This is
> highly confusing.
> 
> I suggest adding a check to see if the remote processor is being attached to 
> in

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Greg Kroah-Hartman

On Mon, Jun 17, 2024 at 05:58:19PM +, Sami Tolvanen wrote:
> Hi folks,
> 
> This series implements CONFIG_MODVERSIONS for Rust, an important
> feature for distributions like Android that want to ship Rust
> kernel modules, and depend on modversions to help ensure module ABI
> compatibility.
> 
> There have been earlier proposals [1][2] that would allow Rust
> modules to coexist with modversions, but none that actually implement
> symbol versioning. Unlike C, Rust source code doesn't have sufficient
> information about the final ABI, as the compiler has considerable
> freedom in adjusting structure layout for improved performance [3],
> for example, which makes using a source code parser like genksyms
> a non-starter. Based on Matt's suggestion and previous feedback
> from maintainers, this series uses DWARF debugging information for
> computing versions. DWARF is an established and relatively stable
> format, which includes all the necessary ABI details, and adding a
> CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a
> reasonable trade-off.
> 
> The first 12 patches of this series add a small tool for computing
> symbol versions from DWARF, called gendwarfksyms. When passed a list
> of exported symbols, the tool generates an expanded type string
> for each symbol, and computes symbol CRCs similarly to genksyms.
> gendwarfksyms is written in C and uses libdw to process DWARF, mainly
> because of the existing support for C host tools that use elfutils
> (e.g., objtool).

That's cool, can the C code be switched to also use this?  That way we
only have one path/code for all of this?

thanks,

greg k-h

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Masahiro Yamada

On Tue, Jun 18, 2024 at 2:58 AM Sami Tolvanen  wrote:
>
> Hi folks,
>
> This series implements CONFIG_MODVERSIONS for Rust, an important
> feature for distributions like Android that want to ship Rust
> kernel modules, and depend on modversions to help ensure module ABI
> compatibility.
>
> There have been earlier proposals [1][2] that would allow Rust
> modules to coexist with modversions, but none that actually implement
> symbol versioning. Unlike C, Rust source code doesn't have sufficient
> information about the final ABI, as the compiler has considerable
> freedom in adjusting structure layout for improved performance [3],
> for example, which makes using a source code parser like genksyms
> a non-starter. Based on Matt's suggestion and previous feedback
> from maintainers, this series uses DWARF debugging information for
> computing versions. DWARF is an established and relatively stable
> format, which includes all the necessary ABI details, and adding a
> CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a
> reasonable trade-off.
>
> The first 12 patches of this series add a small tool for computing
> symbol versions from DWARF, called gendwarfksyms. When passed a list
> of exported symbols, the tool generates an expanded type string
> for each symbol, and computes symbol CRCs similarly to genksyms.
> gendwarfksyms is written in C and uses libdw to process DWARF, mainly
> because of the existing support for C host tools that use elfutils
> (e.g., objtool).
>
> Another compatibility issue is fitting the extremely long mangled
> Rust symbol names into struct modversion_info, which can only hold
> 55-character names (on 64-bit systems). Previous proposals suggested
> changing the structure to support longer names, but the conclusion was
> that we cannot break userspace tools that parse the version table.
>
> The next two patches of the series implement support for hashed
> symbol names in struct modversion_info, where names longer than 55
> characters are hashed, and the hash is stored in the name field. To
> avoid breaking userspace tools, the binary hash is prefixed with a
> null-terminated string containing the name of the hash function. While
> userspace tools can later be updated to potentially produce more
> useful information about the long symbols, this allows them to
> continue working in the meantime.
>
> The final patch allows CONFIG_MODVERSIONS to be selected with Rust,
> provided that debugging information is also available.




I am surprised at someone who attempts to add another variant of genksyms.


I am also surprised at the tool being added under the tools/ directory.

At least, we can avoid the latter misfortune, though.


A patch attached (on top of your patch set).

Such a tool can be compiled in scripts/, or even better
in rust/Makefile if it is only used in rust/Makefile.





--
Best Regards
Masahiro Yamada
From dcd8a348af34bc2cd89ef62765a716b8d63d5b0d Mon Sep 17 00:00:00 2001
From: Masahiro Yamada 
Date: Tue, 18 Jun 2024 10:58:55 +0900
Subject: [PATCH] kbuild: move tools/gendwarfksyms to scripts/gendwarfksyms

There is no good reason to use the tool's fragile build system.
Move tools/gendwarfksyms/ to scripts/gendwarfksyms/.

Add scripts/gendwarfksyms/.gitignore to ignore gendwarfksyms.

Signed-off-by: Masahiro Yamada 
---
 Makefile  |  6 ---
 rust/Makefile |  2 +-
 scripts/Makefile  |  1 +
 scripts/gendwarfksyms/.gitignore  |  2 +
 scripts/gendwarfksyms/Makefile| 10 
 {tools => scripts}/gendwarfksyms/cache.c  |  0
 {tools => scripts}/gendwarfksyms/crc32.c  |  0
 {tools => scripts}/gendwarfksyms/crc32.h  |  0
 .../gendwarfksyms/gendwarfksyms.c |  0
 .../gendwarfksyms/gendwarfksyms.h |  0
 {tools => scripts}/gendwarfksyms/symbols.c|  0
 {tools => scripts}/gendwarfksyms/types.c  |  0
 tools/Makefile|  7 ++-
 tools/gendwarfksyms/Build |  5 --
 tools/gendwarfksyms/Makefile  | 49 ---
 15 files changed, 17 insertions(+), 65 deletions(-)
 create mode 100644 scripts/gendwarfksyms/.gitignore
 create mode 100644 scripts/gendwarfksyms/Makefile
 rename {tools => scripts}/gendwarfksyms/cache.c (100%)
 rename {tools => scripts}/gendwarfksyms/crc32.c (100%)
 rename {tools => scripts}/gendwarfksyms/crc32.h (100%)
 rename {tools => scripts}/gendwarfksyms/gendwarfksyms.c (100%)
 rename {tools => scripts}/gendwarfksyms/gendwarfksyms.h (100%)
 rename {tools => scripts}/gendwarfksyms/symbols.c (100%)
 rename {tools => scripts}/gendwarfksyms/types.c (100%)
 delete mode 100644 tools/gendwarfksyms/Build
 delete mode 100644 tools/gendwarfksyms/Makefile

diff --git a/Makefile b/Makefile
index 1499b9352634..925a75b8ba7d 100644
--- a/Makefile
+++ b/Makefile
@@ -1344,12 +1344,6 @@ prepare: tools/bpf/resolve_btfids
 endif
 endif

Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up

2024-06-18 Thread Steven Rostedt

On Tue, 18 Jun 2024 13:47:49 +0200
Alexander Graf  wrote:

> IMHO the big fat disclaimer should be in the argument name. 
> "reserve_mem" to me sounds like it actually guarantees a reservation - 
> which it doesn't. Can we name it more along the lines of "debug" (to 
> indicate it's not for production data) or "phoenix" (usually gets reborn 
> out of ashes, but you can never know for sure): "debug_mem", / 
> "phoenix_mem"?

I don't see any reason it will not reserve memory. That doesn't seem to
be the issue.  What is not guaranteed is that it will be in the same
location as last time. So the name is correct. It's not
"reserve_consistent_memory" ;-)

-- Steve

Re: [PATCH v6 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up

2024-06-18 Thread Steven Rostedt

On Tue, 18 Jun 2024 20:55:17 +0800
Zhengyejian  wrote:

> > +   start = memblock_phys_alloc(size, align);
> > +   if (!start)
> > +   return -ENOMEM;
> > +
> > +   reserved_mem_add(start, size, name);
> > +
> > +   return 0;  
> 
> Hi, steve, should here return 1 ? Other __setup functions
> return 1 when it success.

Ug, you're correct.

Mike, want me to send a v7?

-- Steve

Re: [PATCH 2/2] arm64: dts: qcom: qcm6490-shift-otter: Name the regulators

2024-06-18 Thread Caleb Connolly





On 18/06/2024 15:30, Luca Weiss wrote:

Without explicitly specifying names for the regulators they are named
based on the DeviceTree node name. This results in multiple regulators
with the same name, making debug prints and regulator_summary impossible
to reason about.

Signed-off-by: Luca Weiss 


Reviewed-by: Caleb Connolly 

---
  arch/arm64/boot/dts/qcom/qcm6490-shift-otter.dts | 35 
  1 file changed, 35 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/qcm6490-shift-otter.dts 
b/arch/arm64/boot/dts/qcom/qcm6490-shift-otter.dts
index e82938cab953..4667e47a74bc 100644
--- a/arch/arm64/boot/dts/qcom/qcm6490-shift-otter.dts
+++ b/arch/arm64/boot/dts/qcom/qcm6490-shift-otter.dts
@@ -235,46 +235,54 @@ regulators-0 {
qcom,pmic-id = "b";
  
  		vreg_s1b: smps1 {

+   regulator-name = "vreg_s1b";
regulator-min-microvolt = <184>;
regulator-max-microvolt = <204>;
};
  
  		vreg_s7b: smps7 {

+   regulator-name = "vreg_s7b";
regulator-min-microvolt = <535000>;
regulator-max-microvolt = <112>;
};
  
  		vreg_s8b: smps8 {

+   regulator-name = "vreg_s8b";
regulator-min-microvolt = <120>;
regulator-max-microvolt = <150>;
regulator-initial-mode = ;
};
  
  		vreg_l1b: ldo1 {

+   regulator-name = "vreg_l1b";
regulator-min-microvolt = <825000>;
regulator-max-microvolt = <925000>;
regulator-initial-mode = ;
};
  
  		vreg_l2b: ldo2 {

+   regulator-name = "vreg_l2b";
regulator-min-microvolt = <270>;
regulator-max-microvolt = <3544000>;
regulator-initial-mode = ;
};
  
  		vreg_l3b: ldo3 {

+   regulator-name = "vreg_l3b";
regulator-min-microvolt = <312000>;
regulator-max-microvolt = <91>;
regulator-initial-mode = ;
};
  
  		vreg_l6b: ldo6 {

+   regulator-name = "vreg_l6b";
regulator-min-microvolt = <114>;
regulator-max-microvolt = <126>;
regulator-initial-mode = ;
};
  
  		vreg_l7b: ldo7 {

+   regulator-name = "vreg_l7b";
/* Constrained for UFS VCC, at least until UFS driver 
scales voltage */
regulator-min-microvolt = <2952000>;
regulator-max-microvolt = <2952000>;
@@ -282,66 +290,77 @@ vreg_l7b: ldo7 {
};
  
  		vreg_l8b: ldo8 {

+   regulator-name = "vreg_l8b";
regulator-min-microvolt = <87>;
regulator-max-microvolt = <97>;
regulator-initial-mode = ;
};
  
  		vreg_l9b: ldo9 {

+   regulator-name = "vreg_l9b";
regulator-min-microvolt = <120>;
regulator-max-microvolt = <1304000>;
regulator-initial-mode = ;
};
  
  		vreg_l11b: ldo11 {

+   regulator-name = "vreg_l11b";
regulator-min-microvolt = <1504000>;
regulator-max-microvolt = <200>;
regulator-initial-mode = ;
};
  
  		vreg_l12b: ldo12 {

+   regulator-name = "vreg_l12b";
regulator-min-microvolt = <751000>;
regulator-max-microvolt = <824000>;
regulator-initial-mode = ;
};
  
  		vreg_l13b: ldo13 {

+   regulator-name = "vreg_l13b";
regulator-min-microvolt = <53>;
regulator-max-microvolt = <824000>;
regulator-initial-mode = ;
};
  
  		vreg_l14b: ldo14 {

+   regulator-name = "vreg_l14b";
regulator-min-microvolt = <108>;
regulator-max-microvolt = <1304000>;
regulator-initial-mode = ;
};
  
  		vreg_l15b: ldo15 {

+   regulator-name = "vreg_l15b";
regulator-min-microvolt = <765000>;
regulator-max-microvolt = <102>;
regulator-initial-mode = ;
};
  
  		vreg_l16b: ldo16 {

+   regulator-name = "vreg_l16b";
regulator-min-microvolt = <110>;

[RFC PATCH 1/4] kernel/reboot: Introduce pre_restart notifiers

2024-06-18 Thread Mathieu Desnoyers

Introduce a new pre_restart notifier chain for callbacks that need to
be executed after the system has been made quiescent with
syscore_shutdown(), before machine restart.

This pre_restart notifier chain should be invoked on machine restart and
on emergency machine restart.

The use-case for this new notifier chain is to preserve tracing data
within pmem areas on systems where the BIOS does not clear memory across
warm reboots.

Why do we need a new notifier chain ?

1) The reboot and restart_prepare notifiers are called too early in the
   reboot sequence: they are invoked before syscore_shutdown(), which
   leaves other CPUs actively running threads while those notifiers are
   invoked.

2) The "restart" notifier is meant to trigger the actual machine
   restart, and is not meant to be invoked as a last step immediately
   before restart. It is also not always used: some architecture code
   choose to bypass this restart notifier and reboot directly from the
   architecture code.

Wiring up the architecture code to call this notifier chain is left to
follow-up arch-specific patches.

Signed-off-by: Mathieu Desnoyers 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Ira Weiny 
Cc: Steven Rostedt 
Cc: nvd...@lists.linux.dev
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
---
 include/linux/reboot.h |  4 
 kernel/reboot.c| 51 ++
 2 files changed, 55 insertions(+)

diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index abcdde4df697..c7f340e81451 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -50,6 +50,10 @@ extern int register_restart_handler(struct notifier_block *);
 extern int unregister_restart_handler(struct notifier_block *);
 extern void do_kernel_restart(char *cmd);
 
+extern int register_pre_restart_handler(struct notifier_block *);
+extern int unregister_pre_restart_handler(struct notifier_block *);
+extern void do_kernel_pre_restart(char *cmd);
+
 /*
  * Architecture-specific implementations of sys_reboot commands.
  */
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 22c16e2564cc..b7287dd48d35 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -235,6 +235,57 @@ void do_kernel_restart(char *cmd)
atomic_notifier_call_chain(_handler_list, reboot_mode, cmd);
 }
 
+/*
+ * Notifier list for kernel code which wants to be called immediately
+ * before restarting the system.
+ */
+static ATOMIC_NOTIFIER_HEAD(pre_restart_handler_list);
+
+/**
+ * register_pre_restart_handler - Register function to be called in 
preparation
+ *to reset the system
+ * @nb: Info about handler function to be called
+ *
+ * Registers a function with code to be called in preparation to restart
+ * the system.
+ *
+ * Currently always returns zero, as atomic_notifier_chain_register()
+ * always returns zero.
+ */
+int register_pre_restart_handler(struct notifier_block *nb)
+{
+   return atomic_notifier_chain_register(_restart_handler_list, nb);
+}
+EXPORT_SYMBOL(register_pre_restart_handler);
+
+/**
+ * unregister_pre_restart_handler - Unregister previously registered
+ *  pre-restart handler
+ * @nb: Hook to be unregistered
+ *
+ * Unregisters a previously registered pre-restart handler function.
+ *
+ * Returns zero on success, or %-ENOENT on failure.
+ */
+int unregister_pre_restart_handler(struct notifier_block *nb)
+{
+   return atomic_notifier_chain_unregister(_restart_handler_list, nb);
+}
+EXPORT_SYMBOL(unregister_pre_restart_handler);
+
+/**
+ * do_kernel_pre_restart - Execute kernel pre-restart handler call chain
+ *
+ * Calls functions registered with register_pre_restart_handler.
+ *
+ * Expected to be called from machine_restart and
+ * machine_emergency_restart before invoking the restart handlers.
+ */
+void do_kernel_pre_restart(char *cmd)
+{
+   atomic_notifier_call_chain(_restart_handler_list, reboot_mode, cmd);
+}
+
 void migrate_to_reboot_cpu(void)
 {
/* The boot cpu is always logical cpu 0 */
-- 
2.39.2

[RFC PATCH 3/4] arm64: Invoke pre_restart notifiers

2024-06-18 Thread Mathieu Desnoyers

Invoke the pre_restart notifiers after shutdown, before machine restart.
This allows preserving pmem memory across warm reboots.

Invoke the pre_restart notifiers before emergency machine restart as
well to cover the panic() scenario.

Signed-off-by: Mathieu Desnoyers 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Ira Weiny 
Cc: Steven Rostedt 
Cc: nvd...@lists.linux.dev
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
---
 arch/arm64/kernel/process.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 4ae31b7af6c3..4a27397617fb 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -129,6 +129,8 @@ void machine_restart(char *cmd)
local_irq_disable();
smp_send_stop();
 
+   do_kernel_pre_restart(cmd);
+
/*
 * UpdateCapsule() depends on the system being reset via
 * ResetSystem().
-- 
2.39.2

[RFC PATCH 2/4] nvdimm/pmem: Flush to memory before machine restart

2024-06-18 Thread Mathieu Desnoyers

Register pre-restart notifiers to flush pmem areas from CPU data cache
to memory on reboot, immediately before restarting the machine. This
ensures all other CPUs are quiescent before the pmem data is flushed to
memory.

I did an earlier POC that flushed caches on panic/die oops notifiers [1],
but it did not cover the reboot case. I've been made aware that some
distribution vendors have started shipping their own modified version of
my earlier POC patch. This makes a strong argument for upstreaming this
work.

Use the newly introduced "pre-restart" notifiers to flush pmem data to
memory immediately before machine restart.

Delta from my POC patch [1]:

Looking at the panic() code, it invokes emergency_restart() to restart
the machine, which uses the new pre-restart notifiers. There is
therefore no need to hook into panic handlers explicitly.

Looking at the die notifiers, those don't actually end up triggering
a machine restart, so it does not appear to be relevant to flush pmem
to memory there. I must admit I originally looked at how ftrace hooked
into panic/die-oops handlers for its ring buffers, but the use-case it
different here: we only want to cover machine restart use-cases.

Link: 
https://lore.kernel.org/linux-kernel/f6067e3e-a2bc-483d-b214-6e3fe6691...@efficios.com/
 [1]
Signed-off-by: Mathieu Desnoyers 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Ira Weiny 
Cc: Steven Rostedt 
Cc: nvd...@lists.linux.dev
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/nvdimm/pmem.c | 29 -
 drivers/nvdimm/pmem.h |  2 ++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 598fe2e89bda..bf1d187a9dca 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -26,12 +26,16 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "pmem.h"
 #include "btt.h"
 #include "pfn.h"
 #include "nd.h"
 
+static int pmem_pre_restart_handler(struct notifier_block *self,
+   unsigned long ev, void *unused);
+
 static struct device *to_dev(struct pmem_device *pmem)
 {
/*
@@ -423,6 +427,7 @@ static void pmem_release_disk(void *__pmem)
 {
struct pmem_device *pmem = __pmem;
 
+   unregister_pre_restart_notifier(>pre_restart_notifier);
dax_remove_host(pmem->disk);
kill_dax(pmem->dax_dev);
put_dax(pmem->dax_dev);
@@ -575,9 +580,14 @@ static int pmem_attach_disk(struct device *dev,
goto out_cleanup_dax;
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
}
-   rc = device_add_disk(dev, disk, pmem_attribute_groups);
+   pmem->pre_restart_notifier.notifier_call = pmem_pre_restart_handler;
+   pmem->pre_restart_notifier.priority = 0;
+   rc = register_pre_restart_notifier(>pre_restart_notifier);
if (rc)
goto out_remove_host;
+   rc = device_add_disk(dev, disk, pmem_attribute_groups);
+   if (rc)
+   goto out_unregister_reboot;
if (devm_add_action_or_reset(dev, pmem_release_disk, pmem))
return -ENOMEM;
 
@@ -589,6 +599,8 @@ static int pmem_attach_disk(struct device *dev,
dev_warn(dev, "'badblocks' notification disabled\n");
return 0;
 
+out_unregister_pre_restart:
+   unregister_pre_restart_notifier(>pre_restart_notifier);
 out_remove_host:
dax_remove_host(pmem->disk);
 out_cleanup_dax:
@@ -751,6 +763,21 @@ static void nd_pmem_notify(struct device *dev, enum 
nvdimm_event event)
}
 }
 
+/*
+ * For volatile memory use-cases where explicit flushing of the data cache is
+ * not useful after stores, the pmem reboot notifier is called on preparation
+ * for restart to make sure the content of the pmem memory area is flushed from
+ * data cache to memory, so it can be preserved across warm reboot.
+ */
+static int pmem_pre_restart_handler(struct notifier_block *self,
+   unsigned long ev, void *unused)
+{
+   struct pmem_device *pmem = container_of(self, struct pmem_device, 
pre_restart_notifier);
+
+   arch_wb_cache_pmem(pmem->virt_addr, pmem->size);
+   return NOTIFY_DONE;
+}
+
 MODULE_ALIAS("pmem");
 MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_IO);
 MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_PMEM);
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index 392b0b38acb9..b8a2a518cf82 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -27,6 +28,7 @@ struct pmem_device {
struct dax_device   *dax_dev;
struct gendisk  *disk;
struct dev_pagemap  pgmap;
+   struct notifier_block   pre_restart_notifier;
 };
 
 long __pmem_direct_access(struct pmem_device *pmem,

[RFC PATCH 0/4] Flush nvdimm/pmem to memory before machine restart

2024-06-18 Thread Mathieu Desnoyers

Introduce a new pre_restart notifier chain for callbacks that need to
be executed after the system has been made quiescent with
syscore_shutdown(), before machine restart.

Register pre-restart notifiers to flush pmem areas from CPU data cache
to memory on reboot, immediately before restarting the machine. This
ensures all other CPUs are quiescent before the pmem data is flushed to
memory.

The use-case for this new notifier chain is to preserve tracing data
within pmem areas on systems where the BIOS does not clear memory across
warm reboots.

I did an earlier POC that flushed caches on panic/die oops notifiers [1],
but it did not cover the reboot case. I've been made aware that some
distribution vendors have started shipping their own modified version of
my earlier POC patch. This makes a strong argument for upstreaming this
work.

Link: 
https://lore.kernel.org/linux-kernel/f6067e3e-a2bc-483d-b214-6e3fe6691...@efficios.com/
 [1]
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Ira Weiny 
Cc: nvd...@lists.linux.dev
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org

Mathieu Desnoyers (4):
  kernel/reboot: Introduce pre_restart notifiers
  nvdimm/pmem: Flush to memory before machine restart
  arm64: Invoke pre_restart notifiers
  x86: Invoke pre_restart notifiers

 arch/arm64/kernel/process.c |  2 ++
 arch/x86/kernel/reboot.c|  7 +++--
 drivers/nvdimm/pmem.c   | 29 -
 drivers/nvdimm/pmem.h   |  2 ++
 include/linux/reboot.h  |  4 +++
 kernel/reboot.c | 51 +
 6 files changed, 92 insertions(+), 3 deletions(-)

-- 
2.39.2

[RFC PATCH 4/4] x86: Invoke pre_restart notifiers

2024-06-18 Thread Mathieu Desnoyers

Invoke the pre_restart notifiers after shutdown, before machine restart.
This allows preserving pmem memory across warm reboots.

Invoke the pre_restart notifiers on emergency_machine_restart to cover
the panic() scenario.

Signed-off-by: Mathieu Desnoyers 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Ira Weiny 
Cc: Steven Rostedt 
Cc: nvd...@lists.linux.dev
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
---
 arch/x86/kernel/reboot.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f3130f762784..222619fa63c6 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -631,8 +631,10 @@ static void native_machine_emergency_restart(void)
int orig_reboot_type = reboot_type;
unsigned short mode;
 
-   if (reboot_emergency)
+   if (reboot_emergency) {
+   do_kernel_pre_restart(NULL);
emergency_reboot_disable_virtualization();
+   }
 
tboot_shutdown(TB_SHUTDOWN_REBOOT);
 
@@ -760,12 +762,13 @@ static void __machine_emergency_restart(int emergency)
machine_ops.emergency_restart();
 }
 
-static void native_machine_restart(char *__unused)
+static void native_machine_restart(char *cmd)
 {
pr_notice("machine restart\n");
 
if (!reboot_force)
machine_shutdown();
+   do_kernel_pre_restart(cmd);
__machine_emergency_restart(0);
 }
 
-- 
2.39.2

Re: [PATCH v4 35/35] kmsan: Enable on s390

2024-06-18 Thread Alexander Potapenko

On Thu, Jun 13, 2024 at 5:40 PM Ilya Leoshkevich  wrote:
>
> Now that everything else is in place, enable KMSAN in Kconfig.
>
> Acked-by: Heiko Carstens 
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

Re: [PATCH v4 09/35] kmsan: Expose kmsan_get_metadata()

2024-06-18 Thread Alexander Potapenko

On Thu, Jun 13, 2024 at 5:39 PM Ilya Leoshkevich  wrote:
>
> Each s390 CPU has lowcore pages associated with it. Each CPU sees its
> own lowcore at virtual address 0 through a hardware mechanism called
> prefixing. Additionally, all lowcores are mapped to non-0 virtual
> addresses stored in the lowcore_ptr[] array.
>
> When lowcore is accessed through virtual address 0, one needs to
> resolve metadata for lowcore_ptr[raw_smp_processor_id()].
>
> Expose kmsan_get_metadata() to make it possible to do this from the
> arch code.
>
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

Re: [PATCH v4 12/35] kmsan: Support SLAB_POISON

2024-06-18 Thread Alexander Potapenko

On Fri, Jun 14, 2024 at 1:44 AM Ilya Leoshkevich  wrote:
>
> On Thu, 2024-06-13 at 16:30 -0700, SeongJae Park wrote:
> > Hi Ilya,
> >
> > On Thu, 13 Jun 2024 17:34:14 +0200 Ilya Leoshkevich
> >  wrote:
> >
> > > Avoid false KMSAN negatives with SLUB_DEBUG by allowing
> > > kmsan_slab_free() to poison the freed memory, and by preventing
> > > init_object() from unpoisoning new allocations by using __memset().
> > >
> > > There are two alternatives to this approach. First, init_object()
> > > can be marked with __no_sanitize_memory. This annotation should be
> > > used
> > > with great care, because it drops all instrumentation from the
> > > function, and any shadow writes will be lost. Even though this is
> > > not a
> > > concern with the current init_object() implementation, this may
> > > change
> > > in the future.
> > >
> > > Second, kmsan_poison_memory() calls may be added after memset()
> > > calls.
> > > The downside is that init_object() is called from
> > > free_debug_processing(), in which case poisoning will erase the
> > > distinction between simply uninitialized memory and UAF.
> > >
> > > Signed-off-by: Ilya Leoshkevich 
> > > ---
> > >  mm/kmsan/hooks.c |  2 +-
> > >  mm/slub.c| 13 +
> > >  2 files changed, 10 insertions(+), 5 deletions(-)
> > >
> > [...]
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -1139,7 +1139,12 @@ static void init_object(struct kmem_cache
> > > *s, void *object, u8 val)
> > > unsigned int poison_size = s->object_size;
> > >
> > > if (s->flags & SLAB_RED_ZONE) {
> > > -   memset(p - s->red_left_pad, val, s->red_left_pad);
> > > +   /*
> > > +* Use __memset() here and below in order to avoid
> > > overwriting
> > > +* the KMSAN shadow. Keeping the shadow makes it
> > > possible to
> > > +* distinguish uninit-value from use-after-free.
> > > +*/
> > > +   __memset(p - s->red_left_pad, val, s-
> > > >red_left_pad);
> >
> > I found my build test[1] fails with below error on latest mm-unstable
> > branch.
> > 'git bisect' points me this patch.
> >
> >   CC  mm/slub.o
> > /mm/slub.c: In function 'init_object':
> > /mm/slub.c:1147:17: error: implicit declaration of function
> > '__memset'; did you mean 'memset'? [-Werror=implicit-function-
> > declaration]
> >  1147 | __memset(p - s->red_left_pad, val, s-
> > >red_left_pad);
> >   | ^~~~
> >   | memset
> > cc1: some warnings being treated as errors
> >
> > I haven't looked in deep, but reporting first.  Do you have any idea?
> >
> > [1]
> > https://github.com/awslabs/damon-tests/blob/next/corr/tests/build_m68k.sh
> >
> >
> > Thanks,
> > SJ
> >
> > [...]
>
> Thanks for the report.
>
> Apparently not all architectures have __memset(). We should probably go
> back to memset_no_sanitize_memory() [1], but this time mark it with
> noinline __maybe_unused __no_sanitize_memory, like it's done in, e.g.,
> 32/35.
>
> Alexander, what do you think?

We could probably go without __no_sanitize_memory assuming that
platforms supporting KMSAN always have __memset():

  #if defined(CONFIG_KMSAN)
  static inline void *memset_no_sanitize_memory(void *s, int c, size_t n)
  {
  return __memset(s, c, n);
  }
  #else
  static inline void *memset_no_sanitize_memory(void *s, int c, size_t n)
  {
  return memset(s, c, n);
  }
  #endif

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-06-18 Thread Steven Rostedt

On Thu, 13 Jun 2024 10:32:24 +0300
Ilkka Naulapää  wrote:

> ok, so if you don't have any idea where this bug is after those debug
> patches, I'll try to find some time to bisect it as a last resort.
> Stay tuned.

FYI,

I just debugged a strange crash that was caused by my config having
something leftover from your config. Specifically, that was:

CONFIG_FORCE_NR_CPUS

Do you get any warning about nr cpus not matching at boot up?

Regardless, can you disable that and see if you still get the same
crash.

Thanks,

-- Steve

Re: [PATCH v4 16/35] mm: slub: Unpoison the memchr_inv() return value

2024-06-18 Thread Alexander Potapenko

On Thu, Jun 13, 2024 at 5:39 PM Ilya Leoshkevich  wrote:
>
> Even though the KMSAN warnings generated by memchr_inv() are suppressed
> by metadata_access_enable(), its return value may still be poisoned.
>
> The reason is that the last iteration of memchr_inv() returns
> `*start != value ? start : NULL`, where *start is poisoned. Because of
> this, somewhat counterintuitively, the shadow value computed by
> visitSelectInst() is equal to `(uintptr_t)start`.
>
> The intention behind guarding memchr_inv() behind
> metadata_access_enable() is to touch poisoned metadata without
> triggering KMSAN, so unpoison its return value.

What do you think about applying __no_kmsan_checks to these functions instead?

Re: [PATCH v4 14/35] kmsan: Do not round up pg_data_t size

2024-06-18 Thread Alexander Potapenko

On Thu, Jun 13, 2024 at 5:39 PM Ilya Leoshkevich  wrote:
>
> x86's alloc_node_data() rounds up node data size to PAGE_SIZE. It's not
> explained why it's needed, but it's most likely for performance
> reasons, since the padding bytes are not used anywhere. Some other
> architectures do it as well, e.g., mips rounds it up to the cache line
> size.
>
> kmsan_init_shadow() initializes metadata for each node data and assumes
> the x86 rounding, which does not match other architectures. This may
> cause the range end to overshoot the end of available memory, in turn
> causing virt_to_page_or_null() in kmsan_init_alloc_meta_for_range() to
> return NULL, which leads to kernel panic shortly after.
>
> Since the padding bytes are not used, drop the rounding.

Nice catch, thanks!

> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

Re: [PATCH v4 15/35] mm: slub: Let KMSAN access metadata

2024-06-18 Thread Alexander Potapenko

On Thu, Jun 13, 2024 at 5:39 PM Ilya Leoshkevich  wrote:
>
> Building the kernel with CONFIG_SLUB_DEBUG and CONFIG_KMSAN causes
> KMSAN to complain about touching redzones in kfree().
>
> Fix by extending the existing KASAN-related metadata_access_enable()
> and metadata_access_disable() functions to KMSAN.
>
> Acked-by: Vlastimil Babka 
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

Re: [PATCH] riscv: Fix early ftrace nop patching

2024-06-18 Thread Alexandre Ghiti

Hi Andy,

On Tue, Jun 18, 2024 at 2:48 PM Andy Chiu  wrote:
>
> On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti  wrote:
> >
> > Hi Conor,
> >
> > On 17/06/2024 15:23, Alexandre Ghiti wrote:
> > > Hi Conor,
> > >
> > > Sorry for the delay here.
> > >
> > > On 13/06/2024 09:48, Conor Dooley wrote:
> > >> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
> > >>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> > >>> converted ftrace_make_nop() to use patch_insn_write() which does not
> > >>> emit any icache flush relying entirely on __ftrace_modify_code() to do
> > >>> that.
> > >>>
> > >>> But we missed that ftrace_make_nop() was called very early directly
> > >>> when
> > >>> converting mcount calls into nops (actually on riscv it converts 2B
> > >>> nops
> > >>> emitted by the compiler into 4B nops).
> > >>>
> > >>> This caused crashes on multiple HW as reported by Conor and Björn since
> > >>> the booting core could have half-patched instructions in its icache
> > >>> which would trigger an illegal instruction trap: fix this by emitting a
> > >>> local flush icache when early patching nops.
> > >>>
> > >>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
> > >>> Signed-off-by: Alexandre Ghiti 
> > >>> ---
> > >>>   arch/riscv/include/asm/cacheflush.h | 6 ++
> > >>>   arch/riscv/kernel/ftrace.c  | 3 +++
> > >>>   2 files changed, 9 insertions(+)
> > >>>
> > >>> diff --git a/arch/riscv/include/asm/cacheflush.h
> > >>> b/arch/riscv/include/asm/cacheflush.h
> > >>> index dd8d07146116..ce79c558a4c8 100644
> > >>> --- a/arch/riscv/include/asm/cacheflush.h
> > >>> +++ b/arch/riscv/include/asm/cacheflush.h
> > >>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
> > >>>   asm volatile ("fence.i" ::: "memory");
> > >>>   }
> > >>>   +static inline void local_flush_icache_range(unsigned long start,
> > >>> +unsigned long end)
> > >>> +{
> > >>> +local_flush_icache_all();
> > >>> +}
> > >>> +
> > >>>   #define PG_dcache_clean PG_arch_1
> > >>> static inline void flush_dcache_folio(struct folio *folio)
> > >>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > >>> index 4f4987a6d83d..32e7c401dfb4 100644
> > >>> --- a/arch/riscv/kernel/ftrace.c
> > >>> +++ b/arch/riscv/kernel/ftrace.c
> > >>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct
> > >>> dyn_ftrace *rec)
> > >>>   out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > >>>   mutex_unlock(_mutex);
> > >> So, turns out that this patch is not sufficient. I've seen some more
> > >> crashes, seemingly due to initialising nops on this mutex_unlock().
> > >> Palmer suggested moving the if (!mod) ... into the lock, which fixed
> > >> the problem for me.
> > >
> > >
> > > Ok, it makes sense, I completely missed that. I'll send a fix for that
> > > shortly so that it can be merged in rc5.
> > >
> > > Thanks,
> > >
> > > Alex
> >
> >
> > So I digged a bit more and I'm afraid that the only way to make sure
> > this issue does not happen elsewhere is to flush the icache right after
> > the patching. We actually can't wait to batch the icache flush since
> > along the way, we may call a function that has just been patched (the
> > issue that you encountered here).
>
> Trying to provide my thoughts, please let me know if I missed
> anything. I think what Conor suggested is safe for init_nop, as it
> would be called only when there is only one core (booting) and at the
> loading stage of kernel modules. In the first case we just have to
> make sure there is no patchable entry before the core executes
> fence.i. The second case is unconditionally safe because there is no
> read-side of the race.
>
> It is a bit tricky for the generic (runtime) case of ftrace code
> patching, but that is not because of the batch fence.i maintenance. As
> long as there exists a patchable entry for the stopping thread, it is
> possible for them to execute on a partially patched instruction. A
> solution for this is again to prevent any patchable entry in the
> stop_machine's stopping thread. Another solution is to apply the
> atomic ftrace patching series which aims to get rid of the race.

Yeah but my worry is that we can't make sure that functions called
between the patching and the fence.i are not the ones that just get
patched. At least today, patch_insn_write() etc should be marked as
noinstr. But even then, we call core functions that could be patchable
(I went through all and it *seems* we are safe *for now*, but this is
very fragile).

Then what do you think we should do to fix Conor's issue right now:
should we mark the riscv specific functions as noinstr, cross fingers
that the core functions are not (and don't become) patchable and wait
for your atomic patching? Or should we flush as soon as possible as I
proposed above (which to me fixes the issue but hurts performance)?

Thanks,

Alex

>
> >
> > I don't know how much it

Re: [PATCH 2/2] arm64: dts: qcom: qcm6490-shift-otter: Name the regulators

2024-06-18 Thread Konrad Dybcio





On 6/18/24 15:30, Luca Weiss wrote:

Without explicitly specifying names for the regulators they are named
based on the DeviceTree node name. This results in multiple regulators
with the same name, making debug prints and regulator_summary impossible
to reason about.

Signed-off-by: Luca Weiss 
---


Reviewed-by: Konrad Dybcio 

Konrad

Re: [PATCH v3 3/5] arm64: dts: qcom: sdx75: update reserved memory regions for mpss

2024-06-18 Thread Konrad Dybcio





On 6/18/24 15:13, Naina Mehta wrote:

Rename qdss@8880 memory region as qlink_logging memory region
and add qdss_mem memory region at address of 0x8850.
Split mpss_dsmharq_mem region into 2 separate regions and
reduce the size of mpssadsp_mem region.

Signed-off-by: Naina Mehta 
---


Alright, we're getting somewhere. The commit message should however motivate
why such changes are necessary. For all we know, the splitting in two is
currently done for no reason, as qdss_mem and qlink_logging_mem are contiguous
- does the firmware have some expectations about them being separate?

Konrad

Re: [PATCH 1/2] arm64: dts: qcom: qcm6490-fairphone-fp5: Name the regulators

2024-06-18 Thread Konrad Dybcio





On 6/18/24 15:30, Luca Weiss wrote:

Without explicitly specifying names for the regulators they are named
based on the DeviceTree node name. This results in multiple regulators
with the same name, making debug prints and regulator_summary impossible
to reason about.

Signed-off-by: Luca Weiss 
---


Reviewed-by: Konrad Dybcio 

Konrad

[PATCH 2/2] arm64: dts: qcom: qcm6490-shift-otter: Name the regulators

2024-06-18 Thread Luca Weiss

Without explicitly specifying names for the regulators they are named
based on the DeviceTree node name. This results in multiple regulators
with the same name, making debug prints and regulator_summary impossible
to reason about.

Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/qcm6490-shift-otter.dts | 35 
 1 file changed, 35 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/qcm6490-shift-otter.dts 
b/arch/arm64/boot/dts/qcom/qcm6490-shift-otter.dts
index e82938cab953..4667e47a74bc 100644
--- a/arch/arm64/boot/dts/qcom/qcm6490-shift-otter.dts
+++ b/arch/arm64/boot/dts/qcom/qcm6490-shift-otter.dts
@@ -235,46 +235,54 @@ regulators-0 {
qcom,pmic-id = "b";
 
vreg_s1b: smps1 {
+   regulator-name = "vreg_s1b";
regulator-min-microvolt = <184>;
regulator-max-microvolt = <204>;
};
 
vreg_s7b: smps7 {
+   regulator-name = "vreg_s7b";
regulator-min-microvolt = <535000>;
regulator-max-microvolt = <112>;
};
 
vreg_s8b: smps8 {
+   regulator-name = "vreg_s8b";
regulator-min-microvolt = <120>;
regulator-max-microvolt = <150>;
regulator-initial-mode = ;
};
 
vreg_l1b: ldo1 {
+   regulator-name = "vreg_l1b";
regulator-min-microvolt = <825000>;
regulator-max-microvolt = <925000>;
regulator-initial-mode = ;
};
 
vreg_l2b: ldo2 {
+   regulator-name = "vreg_l2b";
regulator-min-microvolt = <270>;
regulator-max-microvolt = <3544000>;
regulator-initial-mode = ;
};
 
vreg_l3b: ldo3 {
+   regulator-name = "vreg_l3b";
regulator-min-microvolt = <312000>;
regulator-max-microvolt = <91>;
regulator-initial-mode = ;
};
 
vreg_l6b: ldo6 {
+   regulator-name = "vreg_l6b";
regulator-min-microvolt = <114>;
regulator-max-microvolt = <126>;
regulator-initial-mode = ;
};
 
vreg_l7b: ldo7 {
+   regulator-name = "vreg_l7b";
/* Constrained for UFS VCC, at least until UFS driver 
scales voltage */
regulator-min-microvolt = <2952000>;
regulator-max-microvolt = <2952000>;
@@ -282,66 +290,77 @@ vreg_l7b: ldo7 {
};
 
vreg_l8b: ldo8 {
+   regulator-name = "vreg_l8b";
regulator-min-microvolt = <87>;
regulator-max-microvolt = <97>;
regulator-initial-mode = ;
};
 
vreg_l9b: ldo9 {
+   regulator-name = "vreg_l9b";
regulator-min-microvolt = <120>;
regulator-max-microvolt = <1304000>;
regulator-initial-mode = ;
};
 
vreg_l11b: ldo11 {
+   regulator-name = "vreg_l11b";
regulator-min-microvolt = <1504000>;
regulator-max-microvolt = <200>;
regulator-initial-mode = ;
};
 
vreg_l12b: ldo12 {
+   regulator-name = "vreg_l12b";
regulator-min-microvolt = <751000>;
regulator-max-microvolt = <824000>;
regulator-initial-mode = ;
};
 
vreg_l13b: ldo13 {
+   regulator-name = "vreg_l13b";
regulator-min-microvolt = <53>;
regulator-max-microvolt = <824000>;
regulator-initial-mode = ;
};
 
vreg_l14b: ldo14 {
+   regulator-name = "vreg_l14b";
regulator-min-microvolt = <108>;
regulator-max-microvolt = <1304000>;
regulator-initial-mode = ;
};
 
vreg_l15b: ldo15 {
+   regulator-name = "vreg_l15b";
regulator-min-microvolt = <765000>;
regulator-max-microvolt = <102>;
regulator-initial-mode = ;
};
 
vreg_l16b: ldo16 {
+   regulator-name = "vreg_l16b";

[PATCH 1/2] arm64: dts: qcom: qcm6490-fairphone-fp5: Name the regulators

2024-06-18 Thread Luca Weiss

Without explicitly specifying names for the regulators they are named
based on the DeviceTree node name. This results in multiple regulators
with the same name, making debug prints and regulator_summary impossible
to reason about.

Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 35 ++
 1 file changed, 35 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts 
b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
index 25ed74d4ebd2..c66c7eda6a69 100644
--- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
+++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
@@ -271,46 +271,54 @@ regulators-0 {
qcom,pmic-id = "b";
 
vreg_s1b: smps1 {
+   regulator-name = "vreg_s1b";
regulator-min-microvolt = <184>;
regulator-max-microvolt = <204>;
};
 
vreg_s7b: smps7 {
+   regulator-name = "vreg_s7b";
regulator-min-microvolt = <535000>;
regulator-max-microvolt = <112>;
};
 
vreg_s8b: smps8 {
+   regulator-name = "vreg_s8b";
regulator-min-microvolt = <120>;
regulator-max-microvolt = <150>;
regulator-initial-mode = ;
};
 
vreg_l1b: ldo1 {
+   regulator-name = "vreg_l1b";
regulator-min-microvolt = <825000>;
regulator-max-microvolt = <925000>;
regulator-initial-mode = ;
};
 
vreg_l2b: ldo2 {
+   regulator-name = "vreg_l2b";
regulator-min-microvolt = <270>;
regulator-max-microvolt = <3544000>;
regulator-initial-mode = ;
};
 
vreg_l3b: ldo3 {
+   regulator-name = "vreg_l3b";
regulator-min-microvolt = <312000>;
regulator-max-microvolt = <91>;
regulator-initial-mode = ;
};
 
vreg_l6b: ldo6 {
+   regulator-name = "vreg_l6b";
regulator-min-microvolt = <114>;
regulator-max-microvolt = <126>;
regulator-initial-mode = ;
};
 
vreg_l7b: ldo7 {
+   regulator-name = "vreg_l7b";
/* Constrained for UFS VCC, at least until UFS driver 
scales voltage */
regulator-min-microvolt = <2952000>;
regulator-max-microvolt = <2952000>;
@@ -318,66 +326,77 @@ vreg_l7b: ldo7 {
};
 
vreg_l8b: ldo8 {
+   regulator-name = "vreg_l8b";
regulator-min-microvolt = <87>;
regulator-max-microvolt = <97>;
regulator-initial-mode = ;
};
 
vreg_l9b: ldo9 {
+   regulator-name = "vreg_l9b";
regulator-min-microvolt = <120>;
regulator-max-microvolt = <1304000>;
regulator-initial-mode = ;
};
 
vreg_l11b: ldo11 {
+   regulator-name = "vreg_l11b";
regulator-min-microvolt = <1504000>;
regulator-max-microvolt = <200>;
regulator-initial-mode = ;
};
 
vreg_l12b: ldo12 {
+   regulator-name = "vreg_l12b";
regulator-min-microvolt = <751000>;
regulator-max-microvolt = <824000>;
regulator-initial-mode = ;
};
 
vreg_l13b: ldo13 {
+   regulator-name = "vreg_l13b";
regulator-min-microvolt = <53>;
regulator-max-microvolt = <824000>;
regulator-initial-mode = ;
};
 
vreg_l14b: ldo14 {
+   regulator-name = "vreg_l14b";
regulator-min-microvolt = <108>;
regulator-max-microvolt = <1304000>;
regulator-initial-mode = ;
};
 
vreg_l15b: ldo15 {
+   regulator-name = "vreg_l15b";
regulator-min-microvolt = <765000>;
regulator-max-microvolt = <102>;
regulator-initial-mode = ;
};
 
vreg_l16b: ldo16 {
+   regulator-name =

[PATCH 0/2] Name the regulators for QCM6490 Fairphone 5 & SHIFTphone 8

2024-06-18 Thread Luca Weiss

As per individual commit messages:

Without explicitly specifying names for the regulators they are named
based on the DeviceTree node name. This results in multiple regulators
with the same name, making debug prints and regulator_summary impossible
to reason about.

Signed-off-by: Luca Weiss 
---
Luca Weiss (2):
  arm64: dts: qcom: qcm6490-fairphone-fp5: Name the regulators
  arm64: dts: qcom: qcm6490-shift-otter: Name the regulators

 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 35 ++
 arch/arm64/boot/dts/qcom/qcm6490-shift-otter.dts   | 35 ++
 2 files changed, 70 insertions(+)
---
base-commit: 76db4c64526c5e8ba0f56ad3d890dce8f9b00bbc
change-id: 20240618-qcm6490-regulator-name-bd20a8f17751

Best regards,
-- 
Luca Weiss

[PATCH v3 5/5] arm64: dts: qcom: sdx75-idp: enable MPSS remoteproc node

2024-06-18 Thread Naina Mehta

Enable MPSS remoteproc node on sdx75-idp platform.

Signed-off-by: Naina Mehta 
Reviewed-by: Dmitry Baryshkov 
---
 arch/arm64/boot/dts/qcom/sdx75-idp.dts | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdx75-idp.dts 
b/arch/arm64/boot/dts/qcom/sdx75-idp.dts
index fde16308c7e2..f1bbe7ab01ab 100644
--- a/arch/arm64/boot/dts/qcom/sdx75-idp.dts
+++ b/arch/arm64/boot/dts/qcom/sdx75-idp.dts
@@ -282,6 +282,12 @@ _id_0 {
status = "okay";
 };
 
+_mpss {
+   firmware-name = "qcom/sdx75/modem.mbn",
+   "qcom/sdx75/modem_dtb.mbn";
+   status = "okay";
+};
+
  {
cd-gpios = < 103 GPIO_ACTIVE_LOW>;
vmmc-supply = <_2v95_vdd>;
-- 
2.34.1

[PATCH v3 4/5] arm64: dts: qcom: sdx75: Add remoteproc node

2024-06-18 Thread Naina Mehta

Add MPSS remoteproc node for SDX75 SoC.

Signed-off-by: Naina Mehta 
Reviewed-by: Dmitry Baryshkov 
---
 arch/arm64/boot/dts/qcom/sdx75.dtsi | 47 +
 1 file changed, 47 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdx75.dtsi 
b/arch/arm64/boot/dts/qcom/sdx75.dtsi
index 6f0abcc87a3b..7cf3fcb469a8 100644
--- a/arch/arm64/boot/dts/qcom/sdx75.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdx75.dtsi
@@ -891,6 +891,53 @@ tcsr: syscon@1fc {
reg = <0x0 0x01fc 0x0 0x3>;
};
 
+   remoteproc_mpss: remoteproc@408 {
+   compatible = "qcom,sdx75-mpss-pas";
+   reg = <0 0x0408 0 0x4040>;
+
+   interrupts-extended = < GIC_SPI 250 
IRQ_TYPE_EDGE_RISING>,
+ <_modem_in 0 
IRQ_TYPE_EDGE_RISING>,
+ <_modem_in 1 
IRQ_TYPE_EDGE_RISING>,
+ <_modem_in 2 
IRQ_TYPE_EDGE_RISING>,
+ <_modem_in 3 
IRQ_TYPE_EDGE_RISING>,
+ <_modem_in 7 
IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog",
+ "fatal",
+ "ready",
+ "handover",
+ "stop-ack",
+ "shutdown-ack";
+
+   clocks = < RPMH_CXO_CLK>;
+   clock-names = "xo";
+
+   power-domains = < RPMHPD_CX>,
+   < RPMHPD_MSS>;
+   power-domain-names = "cx",
+"mss";
+
+   memory-region = <_mem>, <_mpss_dtb_mem>,
+   <_dsm_mem>, <_dsm_mem_2>,
+   <_logging_mem>;
+
+   qcom,qmp = <_qmp>;
+
+   qcom,smem-states = <_modem_out 0>;
+   qcom,smem-state-names = "stop";
+
+   status = "disabled";
+
+   glink-edge {
+   interrupts-extended = < IPCC_CLIENT_MPSS
+
IPCC_MPROC_SIGNAL_PING
+
IRQ_TYPE_EDGE_RISING>;
+   mboxes = < IPCC_CLIENT_MPSS
+   IPCC_MPROC_SIGNAL_PING>;
+   label = "mpss";
+   qcom,remote-pid = <1>;
+   };
+   };
+
sdhc: mmc@8804000 {
compatible = "qcom,sdx75-sdhci", "qcom,sdhci-msm-v5";
reg = <0x0 0x08804000 0x0 0x1000>;
-- 
2.34.1

[PATCH v3 3/5] arm64: dts: qcom: sdx75: update reserved memory regions for mpss

2024-06-18 Thread Naina Mehta

Rename qdss@8880 memory region as qlink_logging memory region
and add qdss_mem memory region at address of 0x8850.
Split mpss_dsmharq_mem region into 2 separate regions and
reduce the size of mpssadsp_mem region.

Signed-off-by: Naina Mehta 
---
 arch/arm64/boot/dts/qcom/sdx75.dtsi | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sdx75.dtsi 
b/arch/arm64/boot/dts/qcom/sdx75.dtsi
index 9b93f6501d55..6f0abcc87a3b 100644
--- a/arch/arm64/boot/dts/qcom/sdx75.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdx75.dtsi
@@ -366,7 +366,12 @@ uefi_log_mem: uefi-log@87f75000 {
no-map;
};
 
-   qdss_mem: qdss@8880 {
+   qdss_mem: qdss@8850 {
+   reg = <0x0 0x8850 0x0 0x30>;
+   no-map;
+   };
+
+   qlink_logging_mem: qlink-logging@8880 {
reg = <0x0 0x8880 0x0 0x30>;
no-map;
};
@@ -377,8 +382,13 @@ audio_heap_mem: audio-heap@88b0 {
no-map;
};
 
-   mpss_dsmharq_mem: mpss-dsmharq@88f0 {
-   reg = <0x0 0x88f0 0x0 0x508>;
+   mpss_dsm_mem_2: mpss-dsm-2@88f0 {
+   reg = <0x0 0x88f0 0x0 0x250>;
+   no-map;
+   };
+
+   mpss_dsm_mem: mpss-dsm@8b40 {
+   reg = <0x0 0x8b40 0x0 0x2b8>;
no-map;
};
 
@@ -388,7 +398,7 @@ q6_mpss_dtb_mem: q6-mpss-dtb@8df8 {
};
 
mpssadsp_mem: mpssadsp@8e00 {
-   reg = <0x0 0x8e00 0x0 0xf40>;
+   reg = <0x0 0x8e00 0x0 0xf10>;
no-map;
};
 
-- 
2.34.1

[PATCH v3 2/5] remoteproc: qcom: pas: Add SDX75 remoteproc support

2024-06-18 Thread Naina Mehta

Add MPSS Peripheral Authentication Service support for SDX75 platform.

Signed-off-by: Naina Mehta 
---
 drivers/remoteproc/qcom_q6v5_pas.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
b/drivers/remoteproc/qcom_q6v5_pas.c
index 8458bcfe9e19..833e2f9c2c5e 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -1343,6 +1343,7 @@ static const struct of_device_id adsp_of_match[] = {
{ .compatible = "qcom,sdm845-cdsp-pas", .data = 
_cdsp_resource_init},
{ .compatible = "qcom,sdm845-slpi-pas", .data = 
_slpi_resource_init},
{ .compatible = "qcom,sdx55-mpss-pas", .data = _mpss_resource},
+   { .compatible = "qcom,sdx75-mpss-pas", .data = _mpss_resource},
{ .compatible = "qcom,sm6115-adsp-pas", .data = _resource_init},
{ .compatible = "qcom,sm6115-cdsp-pas", .data = _resource_init},
{ .compatible = "qcom,sm6115-mpss-pas", .data = _mpss_resource},
-- 
2.34.1

[PATCH v3 1/5] dt-bindings: remoteproc: qcom,sm8550-pas: document the SDX75 PAS

2024-06-18 Thread Naina Mehta

Document the MPSS Peripheral Authentication Service on SDX75 platform.

Signed-off-by: Naina Mehta 
Reviewed-by: Krzysztof Kozlowski 
---
 .../devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml| 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml 
b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
index 73fda7565cd1..d7fad7b3c2c6 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
@@ -16,6 +16,7 @@ description:
 properties:
   compatible:
 enum:
+  - qcom,sdx75-mpss-pas
   - qcom,sm8550-adsp-pas
   - qcom,sm8550-cdsp-pas
   - qcom,sm8550-mpss-pas
@@ -113,6 +114,7 @@ allOf:
   properties:
 compatible:
   enum:
+- qcom,sdx75-mpss-pas
 - qcom,sm8650-mpss-pas
 then:
   properties:
@@ -146,6 +148,7 @@ allOf:
   properties:
 compatible:
   enum:
+- qcom,sdx75-mpss-pas
 - qcom,sm8550-mpss-pas
 - qcom,sm8650-mpss-pas
 then:
-- 
2.34.1

[PATCH v3 0/5] Add MPSS remoteproc support for SDX75

2024-06-18 Thread Naina Mehta

Add modem support to SDX75 using the PAS remoteproc driver.
Also, add qlink_logging memory region and split MPSS DSM
region into 2 separate regions.

These patches were co-authored by Rohit Agarwal while at
Qualcomm.

Changes in v3:
 - Updated commit message for reserved memory updation for mpss.
 - Link to v2: 
https://lore.kernel.org/all/20240617093428.3616194-1-quic_nainm...@quicinc.com/

Changes in v2:
 - Added missing binding for SDX75 to allOf constraints.
 - Updated reserved memory node names to remove underscores.
 - Link to v1: 
https://lore.kernel.org/all/20240606143858.4026-1-quic_nainm...@quicinc.com/

Naina Mehta (5):
  dt-bindings: remoteproc: qcom,sm8550-pas: document the SDX75 PAS
  remoteproc: qcom: pas: Add SDX75 remoteproc support
  arm64: dts: qcom: sdx75: update reserved memory regions for mpss
  arm64: dts: qcom: sdx75: Add remoteproc node
  arm64: dts: qcom: sdx75-idp: enable MPSS remoteproc node

 .../bindings/remoteproc/qcom,sm8550-pas.yaml  |  3 +
 arch/arm64/boot/dts/qcom/sdx75-idp.dts|  6 ++
 arch/arm64/boot/dts/qcom/sdx75.dtsi   | 65 +--
 drivers/remoteproc/qcom_q6v5_pas.c|  1 +
 4 files changed, 71 insertions(+), 4 deletions(-)

-- 
2.34.1

Re: [PATCH v6 1/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up

2024-06-18 Thread Zhengyejian


On 2024/6/13 23:55, Steven Rostedt wrote:

From: "Steven Rostedt (Google)" 

In order to allow for requesting a memory region that can be used for
things like pstore on multiple machines where the memory layout is not the
same, add a new option to the kernel command line called "reserve_mem".

The format is:  reserve_mem=nn:align:name

Where it will find nn amount of memory at the given alignment of align.
The name field is to allow another subsystem to retrieve where the memory
was found. For example:

   reserve_mem=12M:4096:oops ramoops.mem_name=oops

Where ramoops.mem_name will tell ramoops that memory was reserved for it
via the reserve_mem option and it can find it by calling:

   if (reserve_mem_find_by_name("oops", , )) {
// start holds the start address and size holds the size given

This is typically used for systems that do not wipe the RAM, and this
command line will try to reserve the same physical memory on soft reboots.
Note, it is not guaranteed to be the same location. For example, if KASLR
places the kernel at the location of where the RAM reservation was from a
previous boot, the new reservation will be at a different location.  Any
subsystem using this feature must add a way to verify that the contents of
the physical memory is from a previous boot, as there may be cases where
the memory will not be located at the same location.

Not all systems may work either. There could be bit flips if the reboot
goes through the BIOS. Using kexec to reboot the machine is likely to
have better results in such cases.

Link: https://lore.kernel.org/all/zjjvnzux3nzig...@kernel.org/

Suggested-by: Mike Rapoport 
Tested-by: Guilherme G. Piccoli 
Signed-off-by: Steven Rostedt (Google) 
---
  .../admin-guide/kernel-parameters.txt |  22 
  include/linux/mm.h|   2 +
  mm/memblock.c | 117 ++
  3 files changed, 141 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index b600df82669d..56e18b1a520d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5710,6 +5710,28 @@
them.  If  is less than 0x1, the region
is assumed to be I/O ports; otherwise it is memory.
  
+	reserve_mem=	[RAM]

+   Format: nn[KNG]::
+   Reserve physical memory and label it with a name that
+   other subsystems can use to access it. This is typically
+   used for systems that do not wipe the RAM, and this 
command
+   line will try to reserve the same physical memory on
+   soft reboots. Note, it is not guaranteed to be the same
+   location. For example, if anything about the system 
changes
+   or if booting a different kernel. It can also fail if 
KASLR
+   places the kernel at the location of where the RAM 
reservation
+   was from a previous boot, the new reservation will be 
at a
+   different location.
+   Any subsystem using this feature must add a way to 
verify
+   that the contents of the physical memory is from a 
previous
+   boot, as there may be cases where the memory will not be
+   located at the same location.
+
+   The format is size:align:label for example, to request
+   12 megabytes of 4096 alignment for ramoops:
+
+   reserve_mem=12M:4096:oops ramoops.mem_name=oops
+
reservetop= [X86-32,EARLY]
Format: nn[KMG]
Reserves a hole at the top of the kernel virtual
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9849dfda44d4..077fb589b88a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4263,4 +4263,6 @@ static inline bool pfn_is_unaccepted_memory(unsigned long 
pfn)
  void vma_pgtable_walk_begin(struct vm_area_struct *vma);
  void vma_pgtable_walk_end(struct vm_area_struct *vma);
  
+int reserve_mem_find_by_name(const char *name, phys_addr_t *start, phys_addr_t *size);

+
  #endif /* _LINUX_MM_H */
diff --git a/mm/memblock.c b/mm/memblock.c
index d09136e040d3..b7b0e8c3868d 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2244,6 +2244,123 @@ void __init memblock_free_all(void)
totalram_pages_add(pages);
  }
  
+/* Keep a table to reserve named memory */

+#define RESERVE_MEM_MAX_ENTRIES8
+#define RESERVE_MEM_NAME_SIZE  16
+struct reserve_mem_table {
+   charname[RESERVE_MEM_NAME_SIZE];
+   phys_addr_t start;
+   phys_addr_t size;
+};
+static struct reserve_mem_table reserved_mem_table[RESERVE_MEM_MAX_ENTRIES];

Re: [PATCH v15 05/14] x86/sgx: Implement basic EPC misc cgroup functionality

2024-06-18 Thread Haitao Huang


On Tue, 18 Jun 2024 06:31:09 -0500, Huang, Kai  wrote:




@@ -921,7 +956,8 @@ static int __init sgx_init(void)
if (!sgx_page_cache_init())
return -ENOMEM;

-   if (!sgx_page_reclaimer_init()) {
+   if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) {
+   misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
ret = -ENOMEM;
goto err_page_cache;
}


This code change is wrong due to two reasons:

1) If sgx_page_reclaimer_init() was successful, but sgx_cgroup_init()
failed, you actually need to 'goto err_kthread' because the ksgxd()  
kernel

thread is already created and is running.

2) There are other cases after here that can also result in sgx_init() to
fail completely, e.g., registering sgx_dev_provision mics device.  We  
need

to reset the capacity to 0 for those cases as well.

AFAICT, you need something like:

diff --git a/arch/x86/kernel/cpu/sgx/main.c
b/arch/x86/kernel/cpu/sgx/main.c
index 27892e57c4ef..46f9c26992a7 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -930,6 +930,10 @@ static int __init sgx_init(void)
if (ret)
goto err_kthread;
+   ret = sgx_cgroup_init();
+   if (ret)
+   goto err_provision;
+
/*
 * Always try to initialize the native *and* KVM drivers.
 * The KVM driver is less picky than the native one and
@@ -941,10 +945,12 @@ static int __init sgx_init(void)
ret = sgx_drv_init();
   if (sgx_vepc_init() && ret)
-   goto err_provision;
+   goto err_cgroup;
   return 0;
+err_cgroup:
+   /* SGX EPC cgroup cleanup */
 err_provision:
misc_deregister(_dev_provision);
@@ -952,6 +958,8 @@ static int __init sgx_init(void)
kthread_stop(ksgxd_tsk);
err_page_cache:
+   misc_misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
+
for (i = 0; i < sgx_nr_epc_sections; i++) {
vfree(sgx_epc_sections[i].pages);
memunmap(sgx_epc_sections[i].virt_addr);


I put the sgx_cgroup_init() before sgx_drv_init() and sgx_vepc_init(),
otherwise you will need sgx_drv_cleanup() and sgx_vepc_cleanup()
respectively when sgx_cgroup_init() fails.



Yes, good catch.


This looks a little bit weird too, though:

Calling misc_misc_cg_set_capacity() to reset capacity to 0 is done at end
of sgx_init() error path, because the "set capacity" part is done in
sgx_epc_cache_init().  
But logically, both "set capacity" and "reset capacity to 0" should be  
SGX

EPC cgroup operation, so it's more reasonable to do "set capacity" in
sgx_cgroup_init() and do "reset to 0" in the

/* SGX EPC cgroup cleanup */

as shown above.

Eventually, you will need to do EPC cgroup cleanup anyway, e.g., to free
the workqueue, so it's odd to have two places to handle EPC cgroup
cleanup.

I understand the reason "set capacity" part is done in
sgx_page_cache_init() now is because in that function you can easily get
the capacity.  But the fact is @sgx_numa_nodes also tracks EPC size for
each node, so you can also get the total EPC size from @sgx_numa_node in
sgx_cgroup_init() and set capacity there.

In this case, you can put "reset capacity to 0" and "free workqueue"
together as the "SGX EPC cgroup cleanup", which is way more clear IMHO.

Okay, will  expose @sgx_numa_nodes to epc_cgroup.c and do the calculations  
in sgx_cgroup_init().

Thanks
Haitao

Re: [PATCH] riscv: Fix early ftrace nop patching

2024-06-18 Thread Andy Chiu

On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti  wrote:
>
> Hi Conor,
>
> On 17/06/2024 15:23, Alexandre Ghiti wrote:
> > Hi Conor,
> >
> > Sorry for the delay here.
> >
> > On 13/06/2024 09:48, Conor Dooley wrote:
> >> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
> >>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> >>> converted ftrace_make_nop() to use patch_insn_write() which does not
> >>> emit any icache flush relying entirely on __ftrace_modify_code() to do
> >>> that.
> >>>
> >>> But we missed that ftrace_make_nop() was called very early directly
> >>> when
> >>> converting mcount calls into nops (actually on riscv it converts 2B
> >>> nops
> >>> emitted by the compiler into 4B nops).
> >>>
> >>> This caused crashes on multiple HW as reported by Conor and Björn since
> >>> the booting core could have half-patched instructions in its icache
> >>> which would trigger an illegal instruction trap: fix this by emitting a
> >>> local flush icache when early patching nops.
> >>>
> >>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
> >>> Signed-off-by: Alexandre Ghiti 
> >>> ---
> >>>   arch/riscv/include/asm/cacheflush.h | 6 ++
> >>>   arch/riscv/kernel/ftrace.c  | 3 +++
> >>>   2 files changed, 9 insertions(+)
> >>>
> >>> diff --git a/arch/riscv/include/asm/cacheflush.h
> >>> b/arch/riscv/include/asm/cacheflush.h
> >>> index dd8d07146116..ce79c558a4c8 100644
> >>> --- a/arch/riscv/include/asm/cacheflush.h
> >>> +++ b/arch/riscv/include/asm/cacheflush.h
> >>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
> >>>   asm volatile ("fence.i" ::: "memory");
> >>>   }
> >>>   +static inline void local_flush_icache_range(unsigned long start,
> >>> +unsigned long end)
> >>> +{
> >>> +local_flush_icache_all();
> >>> +}
> >>> +
> >>>   #define PG_dcache_clean PG_arch_1
> >>> static inline void flush_dcache_folio(struct folio *folio)
> >>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> >>> index 4f4987a6d83d..32e7c401dfb4 100644
> >>> --- a/arch/riscv/kernel/ftrace.c
> >>> +++ b/arch/riscv/kernel/ftrace.c
> >>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct
> >>> dyn_ftrace *rec)
> >>>   out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> >>>   mutex_unlock(_mutex);
> >> So, turns out that this patch is not sufficient. I've seen some more
> >> crashes, seemingly due to initialising nops on this mutex_unlock().
> >> Palmer suggested moving the if (!mod) ... into the lock, which fixed
> >> the problem for me.
> >
> >
> > Ok, it makes sense, I completely missed that. I'll send a fix for that
> > shortly so that it can be merged in rc5.
> >
> > Thanks,
> >
> > Alex
>
>
> So I digged a bit more and I'm afraid that the only way to make sure
> this issue does not happen elsewhere is to flush the icache right after
> the patching. We actually can't wait to batch the icache flush since
> along the way, we may call a function that has just been patched (the
> issue that you encountered here).

Trying to provide my thoughts, please let me know if I missed
anything. I think what Conor suggested is safe for init_nop, as it
would be called only when there is only one core (booting) and at the
loading stage of kernel modules. In the first case we just have to
make sure there is no patchable entry before the core executes
fence.i. The second case is unconditionally safe because there is no
read-side of the race.

It is a bit tricky for the generic (runtime) case of ftrace code
patching, but that is not because of the batch fence.i maintenance. As
long as there exists a patchable entry for the stopping thread, it is
possible for them to execute on a partially patched instruction. A
solution for this is again to prevent any patchable entry in the
stop_machine's stopping thread. Another solution is to apply the
atomic ftrace patching series which aims to get rid of the race.

>
> I don't know how much it will impact the performance but I guess it will.
>
> Unless someone has a better idea (I added Andy and Puranjay in cc), here
> is the patch that implements this, can you give it a try? Thanks
>
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index 87cbd86576b2..4b95c574fd04 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct
> dyn_ftrace *rec)
>  out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>  mutex_unlock(_mutex);
>
> -   if (!mod)
> -   local_flush_icache_range(rec->ip, rec->ip +
> MCOUNT_INSN_SIZE);
> -
>  return out;
>   }
>
> @@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data)
>  } else {
>  while (atomic_read(>cpu_count) <= num_online_cpus())
>  cpu_relax();
> -   }
>
> -   local_flush_icache_all();
> +

Re: [PATCH v4 11/35] kmsan: Allow disabling KMSAN checks for the current task

2024-06-18 Thread Alexander Potapenko

On Thu, Jun 13, 2024 at 5:39 PM Ilya Leoshkevich  wrote:
>
> Like for KASAN, it's useful to temporarily disable KMSAN checks around,
> e.g., redzone accesses. Introduce kmsan_disable_current() and
> kmsan_enable_current(), which are similar to their KASAN counterparts.
>
> Make them reentrant in order to handle memory allocations in interrupt
> context. Repurpose the allow_reporting field for this.

I am still a bit reluctant, because these nested counters always end
up being inconsistent.
But your patch series fixes support for SLUB_DEBUG, and I don't have
better ideas how to do this.

Could you please extend "Disabling the instrumentation" in kmsan.rst
so that it explains the new enable/disable API?
I think we should mention that the users need to be careful with it,
keeping the regions short and preferring other ways to disable
instrumentation, where possible.

> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

[PATCH RESEND v10 01/12] clk: mmp: Switch to use struct u32_fract instead of custom one

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Andy Shevchenko 

The struct mmp_clk_factor_tbl repeats the generic struct u32_fract.
Kill the custom one and use the generic one instead.

Signed-off-by: Andy Shevchenko 
Tested-by: Duje Mihanović 
Reviewed-by: Linus Walleij 
Reviewed-by: Stephen Boyd 
Signed-off-by: Duje Mihanović 
---
 drivers/clk/mmp/clk-frac.c   | 57 
 drivers/clk/mmp/clk-of-mmp2.c| 26 +-
 drivers/clk/mmp/clk-of-pxa168.c  |  4 +--
 drivers/clk/mmp/clk-of-pxa1928.c |  6 ++---
 drivers/clk/mmp/clk-of-pxa910.c  |  4 +--
 drivers/clk/mmp/clk.h| 10 +++
 6 files changed, 51 insertions(+), 56 deletions(-)

diff --git a/drivers/clk/mmp/clk-frac.c b/drivers/clk/mmp/clk-frac.c
index 1b90867b60c4..6556f6ada2e8 100644
--- a/drivers/clk/mmp/clk-frac.c
+++ b/drivers/clk/mmp/clk-frac.c
@@ -26,14 +26,15 @@ static long clk_factor_round_rate(struct clk_hw *hw, 
unsigned long drate,
 {
struct mmp_clk_factor *factor = to_clk_factor(hw);
u64 rate = 0, prev_rate;
+   struct u32_fract *d;
int i;
 
for (i = 0; i < factor->ftbl_cnt; i++) {
-   prev_rate = rate;
-   rate = *prate;
-   rate *= factor->ftbl[i].den;
-   do_div(rate, factor->ftbl[i].num * factor->masks->factor);
+   d = >ftbl[i];
 
+   prev_rate = rate;
+   rate = (u64)(*prate) * d->denominator;
+   do_div(rate, d->numerator * factor->masks->factor);
if (rate > drate)
break;
}
@@ -52,23 +53,22 @@ static unsigned long clk_factor_recalc_rate(struct clk_hw 
*hw,
 {
struct mmp_clk_factor *factor = to_clk_factor(hw);
struct mmp_clk_factor_masks *masks = factor->masks;
-   unsigned int val, num, den;
+   struct u32_fract d;
+   unsigned int val;
u64 rate;
 
val = readl_relaxed(factor->base);
 
/* calculate numerator */
-   num = (val >> masks->num_shift) & masks->num_mask;
+   d.numerator = (val >> masks->num_shift) & masks->num_mask;
 
/* calculate denominator */
-   den = (val >> masks->den_shift) & masks->den_mask;
-
-   if (!den)
+   d.denominator = (val >> masks->den_shift) & masks->den_mask;
+   if (!d.denominator)
return 0;
 
-   rate = parent_rate;
-   rate *= den;
-   do_div(rate, num * factor->masks->factor);
+   rate = (u64)parent_rate * d.denominator;
+   do_div(rate, d.numerator * factor->masks->factor);
 
return rate;
 }
@@ -82,18 +82,18 @@ static int clk_factor_set_rate(struct clk_hw *hw, unsigned 
long drate,
int i;
unsigned long val;
unsigned long flags = 0;
+   struct u32_fract *d;
u64 rate = 0;
 
for (i = 0; i < factor->ftbl_cnt; i++) {
-   rate = prate;
-   rate *= factor->ftbl[i].den;
-   do_div(rate, factor->ftbl[i].num * factor->masks->factor);
+   d = >ftbl[i];
 
+   rate = (u64)prate * d->denominator;
+   do_div(rate, d->numerator * factor->masks->factor);
if (rate > drate)
break;
}
-   if (i > 0)
-   i--;
+   d = i ? >ftbl[i - 1] : >ftbl[0];
 
if (factor->lock)
spin_lock_irqsave(factor->lock, flags);
@@ -101,10 +101,10 @@ static int clk_factor_set_rate(struct clk_hw *hw, 
unsigned long drate,
val = readl_relaxed(factor->base);
 
val &= ~(masks->num_mask << masks->num_shift);
-   val |= (factor->ftbl[i].num & masks->num_mask) << masks->num_shift;
+   val |= (d->numerator & masks->num_mask) << masks->num_shift;
 
val &= ~(masks->den_mask << masks->den_shift);
-   val |= (factor->ftbl[i].den & masks->den_mask) << masks->den_shift;
+   val |= (d->denominator & masks->den_mask) << masks->den_shift;
 
writel_relaxed(val, factor->base);
 
@@ -118,7 +118,8 @@ static int clk_factor_init(struct clk_hw *hw)
 {
struct mmp_clk_factor *factor = to_clk_factor(hw);
struct mmp_clk_factor_masks *masks = factor->masks;
-   u32 val, num, den;
+   struct u32_fract d;
+   u32 val;
int i;
unsigned long flags = 0;
 
@@ -128,23 +129,22 @@ static int clk_factor_init(struct clk_hw *hw)
val = readl(factor->base);
 
/* calculate numerator */
-   num = (val >> masks->num_shift) & masks->num_mask;
+   d.numerator = (val >> masks->num_shift) & masks->num_mask;
 
/* calculate denominator */
-   den = (val >> masks->den_shift) & masks->den_mask;
+   d.denominator = (val >> masks->den_shift) & masks->den_mask;
 
for (i = 0; i < factor->ftbl_cnt; i++)
-   if (den == factor->ftbl[i].den && num == factor->ftbl[i].num)
+   if (d.denominator == factor->ftbl[i].denominator &&
+   d.numerator == factor->ftbl[i].numerator)

[PATCH RESEND v10 12/12] MAINTAINERS: add myself as Marvell PXA1908 maintainer

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Duje Mihanović 

Add myself as the maintainer for Marvell PXA1908 SoC support.

Signed-off-by: Duje Mihanović 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ebf03f5f0619..5d48ac9801df 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2370,6 +2370,15 @@ F:   drivers/irqchip/irq-mvebu-*
 F: drivers/pinctrl/mvebu/
 F: drivers/rtc/rtc-armada38x.c
 
+ARM/Marvell PXA1908 SOC support
+M: Duje Mihanović 
+L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
+S: Maintained
+T: git https://gitlab.com/LegoLivesMatter/linux
+F: arch/arm64/boot/dts/marvell/pxa1908*
+F: drivers/clk/mmp/clk-pxa1908*.c
+F: include/dt-bindings/clock/marvell,pxa1908.h
+
 ARM/Mediatek RTC DRIVER
 M: Eddie Huang 
 M: Sean Wang 

-- 
2.45.2

[PATCH RESEND v10 11/12] arm64: dts: Add DTS for Marvell PXA1908 and samsung,coreprimevelte

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Duje Mihanović 

Add DTS for Marvell PXA1908 SoC and Samsung Galaxy Core Prime Value
Edition LTE, a smartphone based on said SoC.

Signed-off-by: Duje Mihanović 
---
 arch/arm64/boot/dts/marvell/Makefile   |   3 +
 .../dts/marvell/pxa1908-samsung-coreprimevelte.dts | 328 +
 arch/arm64/boot/dts/marvell/pxa1908.dtsi   | 300 +++
 3 files changed, 631 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/Makefile 
b/arch/arm64/boot/dts/marvell/Makefile
index 99b8cb3c49e1..687c256d95fe 100644
--- a/arch/arm64/boot/dts/marvell/Makefile
+++ b/arch/arm64/boot/dts/marvell/Makefile
@@ -28,3 +28,6 @@ dtb-$(CONFIG_ARCH_MVEBU) += cn9130-crb-A.dtb
 dtb-$(CONFIG_ARCH_MVEBU) += cn9130-crb-B.dtb
 dtb-$(CONFIG_ARCH_MVEBU) += ac5x-rd-carrier-cn9131.dtb
 dtb-$(CONFIG_ARCH_MVEBU) += ac5-98dx35xx-rd.dtb
+
+# MMP SoC Family
+dtb-$(CONFIG_ARCH_MMP) += pxa1908-samsung-coreprimevelte.dtb
diff --git a/arch/arm64/boot/dts/marvell/pxa1908-samsung-coreprimevelte.dts 
b/arch/arm64/boot/dts/marvell/pxa1908-samsung-coreprimevelte.dts
new file mode 100644
index ..2a1f309c8e54
--- /dev/null
+++ b/arch/arm64/boot/dts/marvell/pxa1908-samsung-coreprimevelte.dts
@@ -0,0 +1,328 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include "pxa1908.dtsi"
+#include 
+#include 
+
+/ {
+   model = "Samsung Galaxy Core Prime VE LTE";
+   compatible = "samsung,coreprimevelte", "marvell,pxa1908";
+
+   aliases {
+   mmc0 =  /* eMMC */
+   mmc1 =  /* SD card */
+   serial0 = 
+   };
+
+   chosen {
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   stdout-path = "serial0:115200n8";
+
+   /* S-Boot places the initramfs here */
+   linux,initrd-start = <0x4d7>;
+   linux,initrd-end = <0x500>;
+
+   fb0: framebuffer@17177000 {
+   compatible = "simple-framebuffer";
+   reg = <0 0x17177000 0 (480 * 800 * 4)>;
+   width = <480>;
+   height = <800>;
+   stride = <(480 * 4)>;
+   format = "a8r8g8b8";
+   };
+   };
+
+   /* Bootloader fills this in */
+   memory {
+   device_type = "memory";
+   reg = <0 0 0 0>;
+   };
+
+   reserved-memory {
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   framebuffer@1700 {
+   reg = <0 0x1700 0 0x180>;
+   no-map;
+   };
+
+   gpu@900 {
+   reg = <0 0x900 0 0x100>;
+   };
+
+   /* Communications processor, aka modem */
+   cp@500 {
+   reg = <0 0x500 0 0x300>;
+   };
+
+   cm3@a00 {
+   reg = <0 0xa00 0 0x8>;
+   };
+
+   seclog@800 {
+   reg = <0 0x800 0 0x10>;
+   };
+
+   ramoops@810 {
+   compatible = "ramoops";
+   reg = <0 0x810 0 0x4>;
+   record-size = <0x8000>;
+   console-size = <0x2>;
+   max-reason = <5>;
+   };
+   };
+
+
+   i2c-muic {
+   compatible = "i2c-gpio";
+   sda-gpios = < 30 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>;
+   scl-gpios = < 29 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>;
+   i2c-gpio,delay-us = <3>;
+   i2c-gpio,timeout-ms = <100>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_muic_pins>;
+
+   muic: extcon@14 {
+   compatible = "siliconmitus,sm5504-muic";
+   reg = <0x14>;
+   interrupt-parent = <>;
+   interrupts = <0 IRQ_TYPE_EDGE_FALLING>;
+   };
+   };
+
+   gpio-keys {
+   compatible = "gpio-keys";
+   pinctrl-names = "default";
+   pinctrl-0 = <_keys_pins>;
+   autorepeat;
+
+   key-home {
+   label = "Home";
+   linux,code = ;
+   gpios = < 50 GPIO_ACTIVE_LOW>;
+   };
+
+   key-volup {
+   label = "Volume Up";
+   linux,code = ;
+   gpios = < 16 GPIO_ACTIVE_LOW>;
+   };
+
+   key-voldown {
+   label = "Volume Down";
+   linux,code = ;
+   gpios = < 17 GPIO_ACTIVE_LOW>;
+   };
+   };
+};
+
+ {
+   status =

[PATCH RESEND v10 10/12] arm64: Kconfig.platforms: Add config for Marvell PXA1908 platform

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Duje Mihanović 

Add ARCH_MMP configuration option for Marvell PXA1908 SoC.

Signed-off-by: Duje Mihanović 
---
 arch/arm64/Kconfig.platforms | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
index 24335565bad5..d71b0b6e75aa 100644
--- a/arch/arm64/Kconfig.platforms
+++ b/arch/arm64/Kconfig.platforms
@@ -168,6 +168,14 @@ config ARCH_MESON
  This enables support for the arm64 based Amlogic SoCs
  such as the s905, S905X/D, S912, A113X/D or S905X/D2
 
+config ARCH_MMP
+   bool "Marvell MMP SoC Family"
+   select PINCTRL
+   select PINCTRL_SINGLE
+   help
+ This enables support for Marvell MMP SoC family, currently
+ supporting PXA1908 aka IAP140.
+
 config ARCH_MVEBU
bool "Marvell EBU SoC Family"
select ARMADA_AP806_SYSCON

-- 
2.45.2

[PATCH RESEND v10 09/12] dt-bindings: marvell: Document PXA1908 SoC

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Duje Mihanović 

Add dt binding for the Marvell PXA1908 SoC.

Reviewed-by: Krzysztof Kozlowski 
Signed-off-by: Duje Mihanović 
---
 Documentation/devicetree/bindings/arm/mrvl/mrvl.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/mrvl/mrvl.yaml 
b/Documentation/devicetree/bindings/arm/mrvl/mrvl.yaml
index 4c43eaf3632e..f73bb8ec3a1a 100644
--- a/Documentation/devicetree/bindings/arm/mrvl/mrvl.yaml
+++ b/Documentation/devicetree/bindings/arm/mrvl/mrvl.yaml
@@ -35,6 +35,11 @@ properties:
   - enum:
   - dell,wyse-ariel
   - const: marvell,mmp3
+  - description: PXA1908 based boards
+items:
+  - enum:
+  - samsung,coreprimevelte
+  - const: marvell,pxa1908
 
 additionalProperties: true
 

-- 
2.45.2

[PATCH RESEND v10 05/12] clk: mmp: Add Marvell PXA1908 APBC driver

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Duje Mihanović 

Add driver for the APBC controller block found on Marvell's PXA1908 SoC.

Signed-off-by: Duje Mihanović 
---
 drivers/clk/mmp/Makefile   |   2 +-
 drivers/clk/mmp/clk-pxa1908-apbc.c | 131 +
 2 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/mmp/Makefile b/drivers/clk/mmp/Makefile
index 441bf83080a1..685bb80f8ae1 100644
--- a/drivers/clk/mmp/Makefile
+++ b/drivers/clk/mmp/Makefile
@@ -11,4 +11,4 @@ obj-$(CONFIG_MACH_MMP_DT) += clk-of-pxa168.o clk-of-pxa910.o
 obj-$(CONFIG_COMMON_CLK_MMP2) += clk-of-mmp2.o clk-pll.o pwr-island.o
 obj-$(CONFIG_COMMON_CLK_MMP2_AUDIO) += clk-audio.o
 
-obj-y += clk-of-pxa1928.o
+obj-$(CONFIG_ARCH_MMP) += clk-of-pxa1928.o clk-pxa1908-apbc.o
diff --git a/drivers/clk/mmp/clk-pxa1908-apbc.c 
b/drivers/clk/mmp/clk-pxa1908-apbc.c
new file mode 100644
index ..a418b9f895c1
--- /dev/null
+++ b/drivers/clk/mmp/clk-pxa1908-apbc.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "clk.h"
+
+#define APBC_UART0 0x0
+#define APBC_UART1 0x4
+#define APBC_GPIO  0x8
+#define APBC_PWM0  0xc
+#define APBC_PWM1  0x10
+#define APBC_PWM2  0x14
+#define APBC_PWM3  0x18
+#define APBC_SSP0  0x1c
+#define APBC_SSP1  0x20
+#define APBC_IPC_RST   0x24
+#define APBC_RTC   0x28
+#define APBC_TWSI0 0x2c
+#define APBC_KPC   0x30
+#define APBC_SWJTAG0x40
+#define APBC_SSP2  0x4c
+#define APBC_TWSI1 0x60
+#define APBC_THERMAL   0x6c
+#define APBC_TWSI3 0x70
+
+#define APBC_NR_CLKS   19
+
+struct pxa1908_clk_unit {
+   struct mmp_clk_unit unit;
+   void __iomem *base;
+};
+
+static DEFINE_SPINLOCK(pwm0_lock);
+static DEFINE_SPINLOCK(pwm2_lock);
+
+static DEFINE_SPINLOCK(uart0_lock);
+static DEFINE_SPINLOCK(uart1_lock);
+
+static const char * const uart_parent_names[] = {"pll1_117", "uart_pll"};
+static const char * const ssp_parent_names[] = {"pll1_d16", "pll1_d48", 
"pll1_d24", "pll1_d12"};
+
+static struct mmp_param_gate_clk apbc_gate_clks[] = {
+   {PXA1908_CLK_TWSI0, "twsi0_clk", "pll1_32", CLK_SET_RATE_PARENT, 
APBC_TWSI0, 0x7, 3, 0, 0, NULL},
+   {PXA1908_CLK_TWSI1, "twsi1_clk", "pll1_32", CLK_SET_RATE_PARENT, 
APBC_TWSI1, 0x7, 3, 0, 0, NULL},
+   {PXA1908_CLK_TWSI3, "twsi3_clk", "pll1_32", CLK_SET_RATE_PARENT, 
APBC_TWSI3, 0x7, 3, 0, 0, NULL},
+   {PXA1908_CLK_GPIO, "gpio_clk", "vctcxo", CLK_SET_RATE_PARENT, 
APBC_GPIO, 0x7, 3, 0, 0, NULL},
+   {PXA1908_CLK_KPC, "kpc_clk", "clk32", CLK_SET_RATE_PARENT, APBC_KPC, 
0x7, 3, 0, MMP_CLK_GATE_NEED_DELAY, NULL},
+   {PXA1908_CLK_RTC, "rtc_clk", "clk32", CLK_SET_RATE_PARENT, APBC_RTC, 
0x87, 0x83, 0, MMP_CLK_GATE_NEED_DELAY, NULL},
+   {PXA1908_CLK_PWM0, "pwm0_clk", "pwm01_apb_share", CLK_SET_RATE_PARENT, 
APBC_PWM0, 0x2, 2, 0, 0, _lock},
+   {PXA1908_CLK_PWM1, "pwm1_clk", "pwm01_apb_share", CLK_SET_RATE_PARENT, 
APBC_PWM1, 0x6, 2, 0, 0, NULL},
+   {PXA1908_CLK_PWM2, "pwm2_clk", "pwm23_apb_share", CLK_SET_RATE_PARENT, 
APBC_PWM2, 0x2, 2, 0, 0, NULL},
+   {PXA1908_CLK_PWM3, "pwm3_clk", "pwm23_apb_share", CLK_SET_RATE_PARENT, 
APBC_PWM3, 0x6, 2, 0, 0, NULL},
+   {PXA1908_CLK_UART0, "uart0_clk", "uart0_mux", CLK_SET_RATE_PARENT, 
APBC_UART0, 0x7, 3, 0, 0, _lock},
+   {PXA1908_CLK_UART1, "uart1_clk", "uart1_mux", CLK_SET_RATE_PARENT, 
APBC_UART1, 0x7, 3, 0, 0, _lock},
+   {PXA1908_CLK_THERMAL, "thermal_clk", NULL, 0, APBC_THERMAL, 0x7, 3, 0, 
0, NULL},
+   {PXA1908_CLK_IPC_RST, "ipc_clk", NULL, 0, APBC_IPC_RST, 0x7, 3, 0, 0, 
NULL},
+   {PXA1908_CLK_SSP0, "ssp0_clk", "ssp0_mux", 0, APBC_SSP0, 0x7, 3, 0, 0, 
NULL},
+   {PXA1908_CLK_SSP2, "ssp2_clk", "ssp2_mux", 0, APBC_SSP2, 0x7, 3, 0, 0, 
NULL},
+};
+
+static struct mmp_param_mux_clk apbc_mux_clks[] = {
+   {0, "uart0_mux", uart_parent_names, ARRAY_SIZE(uart_parent_names), 
CLK_SET_RATE_PARENT, APBC_UART0, 4, 3, 0, _lock},
+   {0, "uart1_mux", uart_parent_names, ARRAY_SIZE(uart_parent_names), 
CLK_SET_RATE_PARENT, APBC_UART1, 4, 3, 0, _lock},
+   {0, "ssp0_mux", ssp_parent_names, ARRAY_SIZE(ssp_parent_names), 0, 
APBC_SSP0, 4, 3, 0, NULL},
+   {0, "ssp2_mux", ssp_parent_names, ARRAY_SIZE(ssp_parent_names), 0, 
APBC_SSP2, 4, 3, 0, NULL},
+};
+
+static void pxa1908_apb_periph_clk_init(struct pxa1908_clk_unit *pxa_unit)
+{
+   struct mmp_clk_unit *unit = _unit->unit;
+   struct clk *clk;
+
+   mmp_clk_register_gate(NULL, "pwm01_apb_share", "pll1_d48",
+   CLK_SET_RATE_PARENT,
+   pxa_unit->base + APBC_PWM0,
+   0x5, 1, 0, 0, _lock);
+   mmp_clk_register_gate(NULL, "pwm23_apb_share", "pll1_d48",
+   CLK_SET_RATE_PARENT,
+

[PATCH RESEND v10 06/12] clk: mmp: Add Marvell PXA1908 APBCP driver

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Duje Mihanović 

Add driver for the APBCP controller block found on Marvell's PXA1908
SoC.

Signed-off-by: Duje Mihanović 
---
 drivers/clk/mmp/Makefile|  2 +-
 drivers/clk/mmp/clk-pxa1908-apbcp.c | 84 +
 2 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/mmp/Makefile b/drivers/clk/mmp/Makefile
index 685bb80f8ae1..038bcd4d035e 100644
--- a/drivers/clk/mmp/Makefile
+++ b/drivers/clk/mmp/Makefile
@@ -11,4 +11,4 @@ obj-$(CONFIG_MACH_MMP_DT) += clk-of-pxa168.o clk-of-pxa910.o
 obj-$(CONFIG_COMMON_CLK_MMP2) += clk-of-mmp2.o clk-pll.o pwr-island.o
 obj-$(CONFIG_COMMON_CLK_MMP2_AUDIO) += clk-audio.o
 
-obj-$(CONFIG_ARCH_MMP) += clk-of-pxa1928.o clk-pxa1908-apbc.o
+obj-$(CONFIG_ARCH_MMP) += clk-of-pxa1928.o clk-pxa1908-apbc.o 
clk-pxa1908-apbcp.o
diff --git a/drivers/clk/mmp/clk-pxa1908-apbcp.c 
b/drivers/clk/mmp/clk-pxa1908-apbcp.c
new file mode 100644
index ..2a53cf276407
--- /dev/null
+++ b/drivers/clk/mmp/clk-pxa1908-apbcp.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "clk.h"
+
+#define APBCP_UART20x1c
+#define APBCP_TWSI20x28
+#define APBCP_AICER0x38
+
+#define APBCP_NR_CLKS  4
+
+struct pxa1908_clk_unit {
+   struct mmp_clk_unit unit;
+   void __iomem *base;
+};
+
+static DEFINE_SPINLOCK(uart2_lock);
+
+static const char * const uart_parent_names[] = {"pll1_117", "uart_pll"};
+
+static struct mmp_param_gate_clk apbcp_gate_clks[] = {
+   {PXA1908_CLK_UART2, "uart2_clk", "uart2_mux", CLK_SET_RATE_PARENT, 
APBCP_UART2, 0x7, 0x3, 0x0, 0, _lock},
+   {PXA1908_CLK_TWSI2, "twsi2_clk", "pll1_32", CLK_SET_RATE_PARENT, 
APBCP_TWSI2, 0x7, 0x3, 0x0, 0, NULL},
+   {PXA1908_CLK_AICER, "ripc_clk", NULL, 0, APBCP_AICER, 0x7, 0x2, 0x0, 0, 
NULL},
+};
+
+static struct mmp_param_mux_clk apbcp_mux_clks[] = {
+   {0, "uart2_mux", uart_parent_names, ARRAY_SIZE(uart_parent_names), 
CLK_SET_RATE_PARENT, APBCP_UART2, 4, 3, 0, _lock},
+};
+
+static void pxa1908_apb_p_periph_clk_init(struct pxa1908_clk_unit *pxa_unit)
+{
+   struct mmp_clk_unit *unit = _unit->unit;
+
+   mmp_register_mux_clks(unit, apbcp_mux_clks, pxa_unit->base,
+   ARRAY_SIZE(apbcp_mux_clks));
+   mmp_register_gate_clks(unit, apbcp_gate_clks, pxa_unit->base,
+   ARRAY_SIZE(apbcp_gate_clks));
+}
+
+static int pxa1908_apbcp_probe(struct platform_device *pdev)
+{
+   struct pxa1908_clk_unit *pxa_unit;
+
+   pxa_unit = devm_kzalloc(>dev, sizeof(*pxa_unit), GFP_KERNEL);
+   if (IS_ERR(pxa_unit))
+   return PTR_ERR(pxa_unit);
+
+   pxa_unit->base = devm_platform_ioremap_resource(pdev, 0);
+   if (IS_ERR(pxa_unit->base))
+   return PTR_ERR(pxa_unit->base);
+
+   mmp_clk_init(pdev->dev.of_node, _unit->unit, APBCP_NR_CLKS);
+
+   pxa1908_apb_p_periph_clk_init(pxa_unit);
+
+   return 0;
+}
+
+static const struct of_device_id pxa1908_apbcp_match_table[] = {
+   { .compatible = "marvell,pxa1908-apbcp" },
+   { }
+};
+MODULE_DEVICE_TABLE(of, pxa1908_apbcp_match_table);
+
+static struct platform_driver pxa1908_apbcp_driver = {
+   .probe = pxa1908_apbcp_probe,
+   .driver = {
+   .name = "pxa1908-apbcp",
+   .of_match_table = pxa1908_apbcp_match_table
+   }
+};
+module_platform_driver(pxa1908_apbcp_driver);
+
+MODULE_AUTHOR("Duje Mihanović ");
+MODULE_DESCRIPTION("Marvell PXA1908 APBCP Clock Driver");
+MODULE_LICENSE("GPL");

-- 
2.45.2

[PATCH RESEND v10 08/12] clk: mmp: Add Marvell PXA1908 MPMU driver

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Duje Mihanović 

Add driver for the MPMU controller block on Marvell's PXA1908 SoC. The
driver is incomplete, currently only supporting the fixed PLL1; dynamic
PLLs 2-4 and CPU/DDR/AXI clock support is missing.

Signed-off-by: Duje Mihanović 
---
 drivers/clk/mmp/Makefile   |   2 +-
 drivers/clk/mmp/clk-pxa1908-mpmu.c | 112 +
 2 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/mmp/Makefile b/drivers/clk/mmp/Makefile
index a8b1a4b08824..062cd87fa8dd 100644
--- a/drivers/clk/mmp/Makefile
+++ b/drivers/clk/mmp/Makefile
@@ -11,4 +11,4 @@ obj-$(CONFIG_MACH_MMP_DT) += clk-of-pxa168.o clk-of-pxa910.o
 obj-$(CONFIG_COMMON_CLK_MMP2) += clk-of-mmp2.o clk-pll.o pwr-island.o
 obj-$(CONFIG_COMMON_CLK_MMP2_AUDIO) += clk-audio.o
 
-obj-$(CONFIG_ARCH_MMP) += clk-of-pxa1928.o clk-pxa1908-apbc.o 
clk-pxa1908-apbcp.o clk-pxa1908-apmu.o
+obj-$(CONFIG_ARCH_MMP) += clk-of-pxa1928.o clk-pxa1908-apbc.o 
clk-pxa1908-apbcp.o clk-pxa1908-apmu.o clk-pxa1908-mpmu.o
diff --git a/drivers/clk/mmp/clk-pxa1908-mpmu.c 
b/drivers/clk/mmp/clk-pxa1908-mpmu.c
new file mode 100644
index ..e3337bacaadd
--- /dev/null
+++ b/drivers/clk/mmp/clk-pxa1908-mpmu.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "clk.h"
+
+#define MPMU_UART_PLL  0x14
+
+#define MPMU_NR_CLKS   39
+
+struct pxa1908_clk_unit {
+   struct mmp_clk_unit unit;
+   void __iomem *base;
+};
+
+static struct mmp_param_fixed_rate_clk fixed_rate_clks[] = {
+   {PXA1908_CLK_CLK32, "clk32", NULL, 0, 32768},
+   {PXA1908_CLK_VCTCXO, "vctcxo", NULL, 0, 26 * HZ_PER_MHZ},
+   {PXA1908_CLK_PLL1_624, "pll1_624", NULL, 0, 624 * HZ_PER_MHZ},
+   {PXA1908_CLK_PLL1_416, "pll1_416", NULL, 0, 416 * HZ_PER_MHZ},
+   {PXA1908_CLK_PLL1_499, "pll1_499", NULL, 0, 499 * HZ_PER_MHZ},
+   {PXA1908_CLK_PLL1_832, "pll1_832", NULL, 0, 832 * HZ_PER_MHZ},
+   {PXA1908_CLK_PLL1_1248, "pll1_1248", NULL, 0, 1248 * HZ_PER_MHZ},
+};
+
+static struct mmp_param_fixed_factor_clk fixed_factor_clks[] = {
+   {PXA1908_CLK_PLL1_D2, "pll1_d2", "pll1_624", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D4, "pll1_d4", "pll1_d2", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D6, "pll1_d6", "pll1_d2", 1, 3, 0},
+   {PXA1908_CLK_PLL1_D8, "pll1_d8", "pll1_d4", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D12, "pll1_d12", "pll1_d6", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D13, "pll1_d13", "pll1_624", 1, 13, 0},
+   {PXA1908_CLK_PLL1_D16, "pll1_d16", "pll1_d8", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D24, "pll1_d24", "pll1_d12", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D48, "pll1_d48", "pll1_d24", 1, 2, 0},
+   {PXA1908_CLK_PLL1_D96, "pll1_d96", "pll1_d48", 1, 2, 0},
+   {PXA1908_CLK_PLL1_32, "pll1_32", "pll1_d13", 2, 3, 0},
+   {PXA1908_CLK_PLL1_208, "pll1_208", "pll1_d2", 2, 3, 0},
+   {PXA1908_CLK_PLL1_117, "pll1_117", "pll1_624", 3, 16, 0},
+};
+
+static struct u32_fract uart_factor_tbl[] = {
+   {.numerator = 8125, .denominator = 1536},   /* 14.745MHz */
+};
+
+static struct mmp_clk_factor_masks uart_factor_masks = {
+   .factor = 2,
+   .num_mask = GENMASK(12, 0),
+   .den_mask = GENMASK(12, 0),
+   .num_shift = 16,
+   .den_shift = 0,
+};
+
+static void pxa1908_pll_init(struct pxa1908_clk_unit *pxa_unit)
+{
+   struct mmp_clk_unit *unit = _unit->unit;
+
+   mmp_register_fixed_rate_clks(unit, fixed_rate_clks,
+   ARRAY_SIZE(fixed_rate_clks));
+
+   mmp_register_fixed_factor_clks(unit, fixed_factor_clks,
+   ARRAY_SIZE(fixed_factor_clks));
+
+   mmp_clk_register_factor("uart_pll", "pll1_d4",
+   CLK_SET_RATE_PARENT,
+   pxa_unit->base + MPMU_UART_PLL,
+   _factor_masks, uart_factor_tbl,
+   ARRAY_SIZE(uart_factor_tbl), NULL);
+}
+
+static int pxa1908_mpmu_probe(struct platform_device *pdev)
+{
+   struct pxa1908_clk_unit *pxa_unit;
+
+   pxa_unit = devm_kzalloc(>dev, sizeof(*pxa_unit), GFP_KERNEL);
+   if (IS_ERR(pxa_unit))
+   return PTR_ERR(pxa_unit);
+
+   pxa_unit->base = devm_platform_ioremap_resource(pdev, 0);
+   if (IS_ERR(pxa_unit->base))
+   return PTR_ERR(pxa_unit->base);
+
+   mmp_clk_init(pdev->dev.of_node, _unit->unit, MPMU_NR_CLKS);
+
+   pxa1908_pll_init(pxa_unit);
+
+   return 0;
+}
+
+static const struct of_device_id pxa1908_mpmu_match_table[] = {
+   { .compatible = "marvell,pxa1908-mpmu" },
+   { }
+};
+MODULE_DEVICE_TABLE(of, pxa1908_mpmu_match_table);
+
+static struct platform_driver pxa1908_mpmu_driver = {
+   .probe = pxa1908_mpmu_probe,
+   .driver = {
+   .name = "pxa1908-mpmu",
+   .of_match_table = pxa1908_mpmu_match_table
+   }
+};
+module_platform_driver(pxa1908_mpmu_driver);
+

[PATCH RESEND v10 07/12] clk: mmp: Add Marvell PXA1908 APMU driver

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Duje Mihanović 

Add driver for the APMU controller block found on Marvell's PXA1908 SoC.
This driver is incomplete, lacking support for (at least) GPU, VPU, DSI
and CCIC (camera related) clocks.

Signed-off-by: Duje Mihanović 
---
 drivers/clk/mmp/Makefile   |   2 +-
 drivers/clk/mmp/clk-pxa1908-apmu.c | 123 +
 2 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/mmp/Makefile b/drivers/clk/mmp/Makefile
index 038bcd4d035e..a8b1a4b08824 100644
--- a/drivers/clk/mmp/Makefile
+++ b/drivers/clk/mmp/Makefile
@@ -11,4 +11,4 @@ obj-$(CONFIG_MACH_MMP_DT) += clk-of-pxa168.o clk-of-pxa910.o
 obj-$(CONFIG_COMMON_CLK_MMP2) += clk-of-mmp2.o clk-pll.o pwr-island.o
 obj-$(CONFIG_COMMON_CLK_MMP2_AUDIO) += clk-audio.o
 
-obj-$(CONFIG_ARCH_MMP) += clk-of-pxa1928.o clk-pxa1908-apbc.o 
clk-pxa1908-apbcp.o
+obj-$(CONFIG_ARCH_MMP) += clk-of-pxa1928.o clk-pxa1908-apbc.o 
clk-pxa1908-apbcp.o clk-pxa1908-apmu.o
diff --git a/drivers/clk/mmp/clk-pxa1908-apmu.c 
b/drivers/clk/mmp/clk-pxa1908-apmu.c
new file mode 100644
index ..693254539063
--- /dev/null
+++ b/drivers/clk/mmp/clk-pxa1908-apmu.c
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "clk.h"
+
+#define APMU_CLK_GATE_CTRL 0x40
+#define APMU_CCIC1 0x24
+#define APMU_ISP   0x38
+#define APMU_DSI1  0x44
+#define APMU_DISP1 0x4c
+#define APMU_CCIC0 0x50
+#define APMU_SDH0  0x54
+#define APMU_SDH1  0x58
+#define APMU_USB   0x5c
+#define APMU_NF0x60
+#define APMU_VPU   0xa4
+#define APMU_GC0xcc
+#define APMU_SDH2  0xe0
+#define APMU_GC2D  0xf4
+#define APMU_TRACE 0x108
+#define APMU_DVC_DFC_DEBUG 0x140
+
+#define APMU_NR_CLKS   17
+
+struct pxa1908_clk_unit {
+   struct mmp_clk_unit unit;
+   void __iomem *base;
+};
+
+static DEFINE_SPINLOCK(pll1_lock);
+static struct mmp_param_general_gate_clk pll1_gate_clks[] = {
+   {PXA1908_CLK_PLL1_D2_GATE, "pll1_d2_gate", "pll1_d2", 0, 
APMU_CLK_GATE_CTRL, 29, 0, _lock},
+   {PXA1908_CLK_PLL1_416_GATE, "pll1_416_gate", "pll1_416", 0, 
APMU_CLK_GATE_CTRL, 27, 0, _lock},
+   {PXA1908_CLK_PLL1_624_GATE, "pll1_624_gate", "pll1_624", 0, 
APMU_CLK_GATE_CTRL, 26, 0, _lock},
+   {PXA1908_CLK_PLL1_832_GATE, "pll1_832_gate", "pll1_832", 0, 
APMU_CLK_GATE_CTRL, 30, 0, _lock},
+   {PXA1908_CLK_PLL1_1248_GATE, "pll1_1248_gate", "pll1_1248", 0, 
APMU_CLK_GATE_CTRL, 28, 0, _lock},
+};
+
+static DEFINE_SPINLOCK(sdh0_lock);
+static DEFINE_SPINLOCK(sdh1_lock);
+static DEFINE_SPINLOCK(sdh2_lock);
+
+static const char * const sdh_parent_names[] = {"pll1_416", "pll1_624"};
+
+static struct mmp_clk_mix_config sdh_mix_config = {
+   .reg_info = DEFINE_MIX_REG_INFO(3, 8, 2, 6, 11),
+};
+
+static struct mmp_param_gate_clk apmu_gate_clks[] = {
+   {PXA1908_CLK_USB, "usb_clk", NULL, 0, APMU_USB, 0x9, 0x9, 0x1, 0, NULL},
+   {PXA1908_CLK_SDH0, "sdh0_clk", "sdh0_mix_clk", CLK_SET_RATE_PARENT | 
CLK_SET_RATE_UNGATE, APMU_SDH0, 0x12, 0x12, 0x0, 0, _lock},
+   {PXA1908_CLK_SDH1, "sdh1_clk", "sdh1_mix_clk", CLK_SET_RATE_PARENT | 
CLK_SET_RATE_UNGATE, APMU_SDH1, 0x12, 0x12, 0x0, 0, _lock},
+   {PXA1908_CLK_SDH2, "sdh2_clk", "sdh2_mix_clk", CLK_SET_RATE_PARENT | 
CLK_SET_RATE_UNGATE, APMU_SDH2, 0x12, 0x12, 0x0, 0, _lock}
+};
+
+static void pxa1908_axi_periph_clk_init(struct pxa1908_clk_unit *pxa_unit)
+{
+   struct mmp_clk_unit *unit = _unit->unit;
+
+   mmp_register_general_gate_clks(unit, pll1_gate_clks,
+   pxa_unit->base, ARRAY_SIZE(pll1_gate_clks));
+
+   sdh_mix_config.reg_info.reg_clk_ctrl = pxa_unit->base + APMU_SDH0;
+   mmp_clk_register_mix(NULL, "sdh0_mix_clk", sdh_parent_names,
+   ARRAY_SIZE(sdh_parent_names), CLK_SET_RATE_PARENT,
+   _mix_config, _lock);
+   sdh_mix_config.reg_info.reg_clk_ctrl = pxa_unit->base + APMU_SDH1;
+   mmp_clk_register_mix(NULL, "sdh1_mix_clk", sdh_parent_names,
+   ARRAY_SIZE(sdh_parent_names), CLK_SET_RATE_PARENT,
+   _mix_config, _lock);
+   sdh_mix_config.reg_info.reg_clk_ctrl = pxa_unit->base + APMU_SDH2;
+   mmp_clk_register_mix(NULL, "sdh2_mix_clk", sdh_parent_names,
+   ARRAY_SIZE(sdh_parent_names), CLK_SET_RATE_PARENT,
+   _mix_config, _lock);
+
+   mmp_register_gate_clks(unit, apmu_gate_clks, pxa_unit->base,
+   ARRAY_SIZE(apmu_gate_clks));
+}
+
+static int pxa1908_apmu_probe(struct platform_device *pdev)
+{
+   struct pxa1908_clk_unit *pxa_unit;
+
+   pxa_unit = devm_kzalloc(>dev, sizeof(*pxa_unit), GFP_KERNEL);
+   if (IS_ERR(pxa_unit))
+   return PTR_ERR(pxa_unit);
+
+

[PATCH RESEND v10 04/12] dt-bindings: clock: Add Marvell PXA1908 clock bindings

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Duje Mihanović 

Add dt bindings and documentation for the Marvell PXA1908 clock
controller.

Reviewed-by: Conor Dooley 
Reviewed-by: Stephen Boyd 
Signed-off-by: Duje Mihanović 
---
 .../devicetree/bindings/clock/marvell,pxa1908.yaml | 48 
 include/dt-bindings/clock/marvell,pxa1908.h| 88 ++
 2 files changed, 136 insertions(+)

diff --git a/Documentation/devicetree/bindings/clock/marvell,pxa1908.yaml 
b/Documentation/devicetree/bindings/clock/marvell,pxa1908.yaml
new file mode 100644
index ..4e78933232b6
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/marvell,pxa1908.yaml
@@ -0,0 +1,48 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/clock/marvell,pxa1908.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Marvell PXA1908 Clock Controllers
+
+maintainers:
+  - Duje Mihanović 
+
+description: |
+  The PXA1908 clock subsystem generates and supplies clock to various
+  controllers within the PXA1908 SoC. The PXA1908 contains numerous clock
+  controller blocks, with the ones currently supported being APBC, APBCP, MPMU
+  and APMU roughly corresponding to internal buses.
+
+  All these clock identifiers could be found in 
.
+
+properties:
+  compatible:
+enum:
+  - marvell,pxa1908-apbc
+  - marvell,pxa1908-apbcp
+  - marvell,pxa1908-mpmu
+  - marvell,pxa1908-apmu
+
+  reg:
+maxItems: 1
+
+  '#clock-cells':
+const: 1
+
+required:
+  - compatible
+  - reg
+  - '#clock-cells'
+
+additionalProperties: false
+
+examples:
+  # APMU block:
+  - |
+clock-controller@d4282800 {
+  compatible = "marvell,pxa1908-apmu";
+  reg = <0xd4282800 0x400>;
+  #clock-cells = <1>;
+};
diff --git a/include/dt-bindings/clock/marvell,pxa1908.h 
b/include/dt-bindings/clock/marvell,pxa1908.h
new file mode 100644
index ..fb15b0d0cd4c
--- /dev/null
+++ b/include/dt-bindings/clock/marvell,pxa1908.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause */
+#ifndef __DTS_MARVELL_PXA1908_CLOCK_H
+#define __DTS_MARVELL_PXA1908_CLOCK_H
+
+/* plls */
+#define PXA1908_CLK_CLK32  1
+#define PXA1908_CLK_VCTCXO 2
+#define PXA1908_CLK_PLL1_624   3
+#define PXA1908_CLK_PLL1_416   4
+#define PXA1908_CLK_PLL1_499   5
+#define PXA1908_CLK_PLL1_832   6
+#define PXA1908_CLK_PLL1_1248  7
+#define PXA1908_CLK_PLL1_D28
+#define PXA1908_CLK_PLL1_D49
+#define PXA1908_CLK_PLL1_D810
+#define PXA1908_CLK_PLL1_D16   11
+#define PXA1908_CLK_PLL1_D612
+#define PXA1908_CLK_PLL1_D12   13
+#define PXA1908_CLK_PLL1_D24   14
+#define PXA1908_CLK_PLL1_D48   15
+#define PXA1908_CLK_PLL1_D96   16
+#define PXA1908_CLK_PLL1_D13   17
+#define PXA1908_CLK_PLL1_3218
+#define PXA1908_CLK_PLL1_208   19
+#define PXA1908_CLK_PLL1_117   20
+#define PXA1908_CLK_PLL1_416_GATE  21
+#define PXA1908_CLK_PLL1_624_GATE  22
+#define PXA1908_CLK_PLL1_832_GATE  23
+#define PXA1908_CLK_PLL1_1248_GATE 24
+#define PXA1908_CLK_PLL1_D2_GATE   25
+#define PXA1908_CLK_PLL1_499_EN26
+#define PXA1908_CLK_PLL2VCO27
+#define PXA1908_CLK_PLL2   28
+#define PXA1908_CLK_PLL2P  29
+#define PXA1908_CLK_PLL2VCODIV330
+#define PXA1908_CLK_PLL3VCO31
+#define PXA1908_CLK_PLL3   32
+#define PXA1908_CLK_PLL3P  33
+#define PXA1908_CLK_PLL3VCODIV334
+#define PXA1908_CLK_PLL4VCO35
+#define PXA1908_CLK_PLL4   36
+#define PXA1908_CLK_PLL4P  37
+#define PXA1908_CLK_PLL4VCODIV338
+
+/* apb (apbc) peripherals */
+#define PXA1908_CLK_UART0  1
+#define PXA1908_CLK_UART1  2
+#define PXA1908_CLK_GPIO   3
+#define PXA1908_CLK_PWM0   4
+#define PXA1908_CLK_PWM1   5
+#define PXA1908_CLK_PWM2   6
+#define PXA1908_CLK_PWM3   7
+#define PXA1908_CLK_SSP0   8
+#define PXA1908_CLK_SSP1   9
+#define PXA1908_CLK_IPC_RST10
+#define PXA1908_CLK_RTC11
+#define PXA1908_CLK_TWSI0  12
+#define PXA1908_CLK_KPC13
+#define PXA1908_CLK_SWJTAG 14
+#define PXA1908_CLK_SSP2   15
+#define PXA1908_CLK_TWSI1  16
+#define PXA1908_CLK_THERMAL17
+#define PXA1908_CLK_TWSI3  18
+
+/* apb (apbcp) peripherals */
+#define PXA1908_CLK_UART2  1
+#define PXA1908_CLK_TWSI2  2
+#define PXA1908_CLK_AICER  3
+
+/* axi (apmu) peripherals */
+#define PXA1908_CLK_CCIC1  1
+#define PXA1908_CLK_ISP2
+#define PXA1908_CLK_DSI1   3
+#define

[PATCH RESEND v10 02/12] dt-bindings: pinctrl: pinctrl-single: add marvell,pxa1908-padconf compatible

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Duje Mihanović 

Add the "marvell,pxa1908-padconf" compatible to allow migrating to a
separate pinctrl driver later.

Reviewed-by: Rob Herring 
Acked-by: Linus Walleij 
Signed-off-by: Duje Mihanović 
---
 Documentation/devicetree/bindings/pinctrl/pinctrl-single.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-single.yaml 
b/Documentation/devicetree/bindings/pinctrl/pinctrl-single.yaml
index c11495524dd2..1ce24ad8bc73 100644
--- a/Documentation/devicetree/bindings/pinctrl/pinctrl-single.yaml
+++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-single.yaml
@@ -33,6 +33,10 @@ properties:
   - ti,omap5-padconf
   - ti,j7200-padconf
   - const: pinctrl-single
+  - items:
+  - enum:
+  - marvell,pxa1908-padconf
+  - const: pinconf-single
 
   reg:
 maxItems: 1

-- 
2.45.2

[PATCH RESEND v10 03/12] pinctrl: single: add marvell,pxa1908-padconf compatible

2024-06-18 Thread Duje Mihanović via B4 Relay

From: Duje Mihanović 

Add the "marvell,pxa1908-padconf" compatible to allow migrating to a
separate pinctrl driver later.

Acked-by: Linus Walleij 
Signed-off-by: Duje Mihanović 
---
 drivers/pinctrl/pinctrl-single.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pinctrl/pinctrl-single.c b/drivers/pinctrl/pinctrl-single.c
index 19cc0db771a5..c15bf3cbabd7 100644
--- a/drivers/pinctrl/pinctrl-single.c
+++ b/drivers/pinctrl/pinctrl-single.c
@@ -1967,6 +1967,7 @@ static const struct pcs_soc_data pinconf_single = {
 };
 
 static const struct of_device_id pcs_of_match[] = {
+   { .compatible = "marvell,pxa1908-padconf", .data = _single },
{ .compatible = "ti,am437-padconf", .data = _single_am437x },
{ .compatible = "ti,am654-padconf", .data = _single_am654 },
{ .compatible = "ti,dra7-padconf", .data = _single_dra7 },

-- 
2.45.2

[PATCH RESEND v10 00/12] Initial Marvell PXA1908 support

2024-06-18 Thread Duje Mihanović via B4 Relay

Hello,

This series adds initial support for the Marvell PXA1908 SoC and
"samsung,coreprimevelte", a smartphone using the SoC.

USB works and the phone can boot a rootfs from an SD card, but there are
some warnings in the dmesg:

During SMP initialization:
[0.006519] CPU features: SANITY CHECK: Unexpected variation in 
SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU1: 0x00
[0.006542] CPU features: Unsupported CPU feature variation detected.
[0.006589] CPU1: Booted secondary processor 0x01 [0x410fd032]
[0.010710] Detected VIPT I-cache on CPU2
[0.010716] CPU features: SANITY CHECK: Unexpected variation in 
SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU2: 0x00
[0.010758] CPU2: Booted secondary processor 0x02 [0x410fd032]
[0.014849] Detected VIPT I-cache on CPU3
[0.014855] CPU features: SANITY CHECK: Unexpected variation in 
SYS_CNTFRQ_EL0. Boot CPU: 0x00018cba80, CPU3: 0x00
[0.014895] CPU3: Booted secondary processor 0x03 [0x410fd032]

SMMU probing fails:
[0.101798] arm-smmu c001.iommu: probing hardware configuration...
[0.101809] arm-smmu c001.iommu: SMMUv1 with:
[0.101816] arm-smmu c001.iommu: no translation support!

A 3.14 based Marvell tree is available on GitHub
acorn-marvell/brillo_pxa_kernel, and a Samsung one on GitHub
CoderCharmander/g361f-kernel.

Andreas Färber attempted to upstream support for this SoC in 2017:
https://lore.kernel.org/lkml/20170222022929.10540-1-afaer...@suse.de/

Signed-off-by: Duje Mihanović 

Changes in v10:
- Update trailers
- Rebase on v6.9-rc5
- Clock driver changes:
  - Add a couple of forgotten clocks in APBC
- The clocks are thermal_clk, ipc_clk, ssp0_clk, ssp2_clk and swjtag
- The IDs and register offsets were already present, but I forgot to
  actually register them
  - Split each controller block into own file
  - Drop unneeded -of in clock driver filenames
  - Simplify struct pxa1908_clk_unit
  - Convert to platform driver
  - Add module metadata
- DTS changes:
  - Properly name pinctrl nodes
  - Drop pinctrl #size-cells, #address-cells, ranges and #gpio-size-cells
  - Fix pinctrl input-schmitt configuration
- Link to v9: 
https://lore.kernel.org/20240402-pxa1908-lkml-v9-0-25a003e83...@skole.hr

Changes in v9:
- Update trailers and rebase on v6.9-rc2, no changes
- Link to v8: 
https://lore.kernel.org/20240110-pxa1908-lkml-v8-0-fea768a59...@skole.hr

Changes in v8:
- Drop SSPA patch
- Drop broken-cd from eMMC node
- Specify S-Boot hardcoded initramfs location in device tree
- Add ARM PMU node
- Correct inverted modem memory base and size
- Update trailers
- Rebase on next-20240110
- Link to v7: 
https://lore.kernel.org/20231102-pxa1908-lkml-v7-0-cabb1a0cb...@skole.hr
  and https://lore.kernel.org/20231102152033.5511-1-duje.mihano...@skole.hr

Changes in v7:
- Suppress SND_MMP_SOC_SSPA on ARM64
- Update trailers
- Rebase on v6.6-rc7
- Link to v6: 
https://lore.kernel.org/r/20231010-pxa1908-lkml-v6-0-b2fe09240...@skole.hr

Changes in v6:
- Address maintainer comments:
  - Add "marvell,pxa1908-padconf" binding to pinctrl-single driver
- Drop GPIO patch as it's been pulled
- Update trailers
- Rebase on v6.6-rc5
- Link to v5: 
https://lore.kernel.org/r/20230812-pxa1908-lkml-v5-0-a5d51937e...@skole.hr

Changes in v5:
- Address maintainer comments:
  - Move *_NR_CLKS to clock driver from dt binding file
- Allocate correct number of clocks for each block instead of blindly
  allocating 50 for each
- Link to v4: 
https://lore.kernel.org/r/20230807-pxa1908-lkml-v4-0-cb387d73b...@skole.hr

Changes in v4:
- Address maintainer comments:
  - Relicense clock binding file to BSD-2
- Add pinctrl-names to SD card node
- Add vgic registers to GIC node
- Rebase on v6.5-rc5
- Link to v3: 
https://lore.kernel.org/r/20230804-pxa1908-lkml-v3-0-8e48fca37...@skole.hr

Changes in v3:
- Address maintainer comments:
  - Drop GPIO dynamic allocation patch
  - Move clock register offsets into driver (instead of bindings file)
  - Add missing Tested-by trailer to u32_fract patch
  - Move SoC binding to arm/mrvl/mrvl.yaml
- Add serial0 alias and stdout-path to board dts to enable UART
  debugging
- Rebase on v6.5-rc4
- Link to v2: 
https://lore.kernel.org/r/20230727162909.6031-1-duje.mihano...@skole.hr

Changes in v2:
- Remove earlycon patch as it's been merged into tty-next
- Address maintainer comments:
  - Clarify GPIO regressions on older PXA platforms
  - Add Fixes tag to commit disabling GPIO pinctrl calls for this SoC
  - Add missing includes to clock driver
  - Clock driver uses HZ_PER_MHZ, u32_fract and GENMASK
  - Dual license clock bindings
  - Change clock IDs to decimal
  - Fix underscores in dt node names
  - Move chosen node to top of board dts
  - Clean up documentation
  - Reorder commits
  - Drop pxa,rev-id
- Rename muic-i2c to i2c-muic
- Reword some commits
- Move framebuffer node to chosen
- Add aliases for mmc nodes
- Rebase on v6.5-rc3

Re: [PATCH] riscv: Fix early ftrace nop patching

2024-06-18 Thread Alexandre Ghiti


Hi Conor,

On 17/06/2024 15:23, Alexandre Ghiti wrote:

Hi Conor,

Sorry for the delay here.

On 13/06/2024 09:48, Conor Dooley wrote:

On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:

Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
converted ftrace_make_nop() to use patch_insn_write() which does not
emit any icache flush relying entirely on __ftrace_modify_code() to do
that.

But we missed that ftrace_make_nop() was called very early directly 
when
converting mcount calls into nops (actually on riscv it converts 2B 
nops

emitted by the compiler into 4B nops).

This caused crashes on multiple HW as reported by Conor and Björn since
the booting core could have half-patched instructions in its icache
which would trigger an illegal instruction trap: fix this by emitting a
local flush icache when early patching nops.

Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/include/asm/cacheflush.h | 6 ++
  arch/riscv/kernel/ftrace.c  | 3 +++
  2 files changed, 9 insertions(+)

diff --git a/arch/riscv/include/asm/cacheflush.h 
b/arch/riscv/include/asm/cacheflush.h

index dd8d07146116..ce79c558a4c8 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
  asm volatile ("fence.i" ::: "memory");
  }
  +static inline void local_flush_icache_range(unsigned long start,
+    unsigned long end)
+{
+    local_flush_icache_all();
+}
+
  #define PG_dcache_clean PG_arch_1
    static inline void flush_dcache_folio(struct folio *folio)
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 4f4987a6d83d..32e7c401dfb4 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct 
dyn_ftrace *rec)

  out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
  mutex_unlock(_mutex);

So, turns out that this patch is not sufficient. I've seen some more
crashes, seemingly due to initialising nops on this mutex_unlock().
Palmer suggested moving the if (!mod) ... into the lock, which fixed
the problem for me.



Ok, it makes sense, I completely missed that. I'll send a fix for that 
shortly so that it can be merged in rc5.


Thanks,

Alex



So I digged a bit more and I'm afraid that the only way to make sure 
this issue does not happen elsewhere is to flush the icache right after 
the patching. We actually can't wait to batch the icache flush since 
along the way, we may call a function that has just been patched (the 
issue that you encountered here).


I don't know how much it will impact the performance but I guess it will.

Unless someone has a better idea (I added Andy and Puranjay in cc), here 
is the patch that implements this, can you give it a try? Thanks


diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 87cbd86576b2..4b95c574fd04 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct 
dyn_ftrace *rec)

    out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
    mutex_unlock(_mutex);

-   if (!mod)
-   local_flush_icache_range(rec->ip, rec->ip + 
MCOUNT_INSN_SIZE);

-
    return out;
 }

@@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data)
    } else {
    while (atomic_read(>cpu_count) <= num_online_cpus())
    cpu_relax();
-   }

-   local_flush_icache_all();
+   local_flush_icache_all();
+   }

    return 0;
 }
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 4007563fb607..ab03732d06c4 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -89,6 +89,14 @@ static int __patch_insn_set(void *addr, u8 c, size_t len)

    memset(waddr, c, len);

+   /*
+    * We could have just patched a function that is about to be
+    * called so make sure we don't execute partially patched
+    * instructions by flushing the icache as soon as possible.
+    */
+   local_flush_icache_range((unsigned long)waddr,
+    (unsigned long)waddr + len);
+
    patch_unmap(FIX_TEXT_POKE0);

    if (across_pages)
@@ -135,6 +143,14 @@ static int __patch_insn_write(void *addr, const 
void *insn, size_t len)


    ret = copy_to_kernel_nofault(waddr, insn, len);

+   /*
+    * We could have just patched a function that is about to be
+    * called so make sure we don't execute partially patched
+    * instructions by flushing the icache as soon as possible.
+    */
+   local_flush_icache_range((unsigned long)waddr,
+    (unsigned long)waddr + len);
+
    patch_unmap(FIX_TEXT_POKE0);

    if (across_pages)
@@ -189,9 +205,6 @@ int patch_text_set_nosync(void

Re: [PATCH v2 3/3] ARM: dts: qcom: msm8926-motorola-peregrine: Add framebuffer supplies

2024-06-18 Thread Konrad Dybcio





On 6/17/24 23:22, André Apitzsch via B4 Relay wrote:

From: André Apitzsch 

Add regulators used by the framebuffer of Motorola Moto G 4G (2013).

Signed-off-by: André Apitzsch 
---


Reviewed-by: Konrad Dybcio 

Konrad

Re: [PATCH v2 2/3] ARM: dts: qcom: msm8926-motorola-peregrine: Update temperature sensor

2024-06-18 Thread Konrad Dybcio





On 6/17/24 23:22, André Apitzsch via B4 Relay wrote:

From: André Apitzsch 

Add alert interrupt for the temperature sensor of Motorola Moto G 4G
(2013), although not used by the driver yet.

Signed-off-by: André Apitzsch 
---


Reviewed-by: Konrad Dybcio 

Konrad

Re: [PATCH v2 1/3] ARM: dts: qcom: msm8926-motorola-peregrine: Add accelerometer, magnetometer, regulator

2024-06-18 Thread Konrad Dybcio





On 6/17/24 23:22, André Apitzsch via B4 Relay wrote:

From: André Apitzsch 

Add the accelerometer, magnetometer and regulator that are present on
the Motorola Moto G 4G (2013) device.

Signed-off-by: André Apitzsch 
---


Reviewed-by: Konrad Dybcio 

Konrad

Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up

2024-06-18 Thread Alexander Graf



On 18.06.24 12:21, Ard Biesheuvel wrote:

On Mon, 17 Jun 2024 at 23:01, Alexander Graf  wrote:

On 17.06.24 22:40, Steven Rostedt wrote:

On Mon, 17 Jun 2024 09:07:29 +0200
Alexander Graf  wrote:


Hey Steve,


I believe we're talking about 2 different things :). Let me rephrase a
bit and make a concrete example.

Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command
line option. The kernel now comes up and allocates a random chunk of
memory that - by (admittedly good) chance - may be at the same physical
location as before. Let's assume it deemed 0x100 as a good offset.

Note, it's not random. It picks from the top of available memory every
time. But things can mess with it (see below).


Let's now assume you're running on a UEFI system. There, you always have
non-volatile storage available to you even in the pre-boot phase. That
means the kernel could create a UEFI variable that says "12M:4096:trace
-> 0x100". The pre-boot phase takes all these UEFI variables and
marks them as reserved. When you finally reach your command line parsing
logic for reserve_mem=, you can flip all reservations that were not on
the command line back to normal memory.

That way you have pretty much guaranteed persistent memory regions, even
with KASLR changing your memory layout across boots.

The nice thing is that the above is an extension of what you've already
built: Systems with UEFI simply get better guarantees that their regions
persist.

This could be an added feature, but it is very architecture specific,
and would likely need architecture specific updates.


It definitely would be an added feature, yes. But one that allows you to
ensure persistence a lot more safely :).



Requirement:

Need a way to reserve memory that will be at a consistent location for
every boot, if the kernel and system are the same. Does not need to work
if rebooting to a different kernel, or if the system can change the
memory layout between boots.

The reserved memory can not be an hard coded address, as the same kernel /
command line needs to run on several different machines. The picked memory
reservation just needs to be the same for a given machine, but may be

With KASLR is enabled, doesn't this approach break too often to be
reliable enough for the data you want to extract?

Picking up the idea above, with a persistent variable we could even make
KASLR avoid that reserved pstore region in its search for a viable KASLR
offset.

I think I was hit by it once in all my testing. For our use case, the
few times it fails to map is not going to affect what we need this for
at all.

Once is pretty good. Do you know why? Also once out of how many runs? Is
the randomness source not as random as it should be or are the number of
bits for KASLR maybe so few on your target architecture that the odds of
hitting anything become low? Do these same constraints hold true outside
of your testing environment?

So I just ran it a hundred times in a loop. I added a patch to print
the location of "_text". The loop was this:

for i in `seq 100`; do
  ssh root@debiantesting-x86-64 "dmesg | grep -e 'text starts' -e 'mapped boot'  
>> text; grub-reboot '1>0'; sleep 0.5; reboot"
  sleep 25
done

It searches dmesg for my added printk as well as the print of were the
ring buffer was loaded in physical memory.

It takes about 15 seconds to reboot, so I waited 25. The results are
attached. I found that it was consistent 76 times, which means 1 out of
4 it's not. Funny enough, it broke whenever it loaded the kernel below
0x1. And then it would be off by a little.

It was consistently at:

0x27d00

And when it failed, it was at 0x27ce0.

Note, when I used the e820 tables to do this, I never saw a failure. My
assumption is that when it is below 0x1, something else gets
allocated causing this to get pushed down.


Thinking about it again: What if you run the allocation super early (see
arch/x86/boot/compressed/kaslr.c:handle_mem_options())?

That code is not used by EFI boot anymore.

In general, I would recommend (and have recommended) against these
kinds of hacks in mainline, because -especially on x86- there is
always someone that turns up with some kind of convoluted use case
that gets broken if we try to change anything in the boot code.

I spent considerable time over the past year making the EFI/x86 boot
code compatible with the new MS imposed requirements on PC boot
firmware (related to secure boot and NX restrictions on memory
mappings). This involved some radical refactoring of the boot
sequence, including the KASLR logic. Adding fragile code there that
will result in regressions observable to end users when it gets broken
is really not what I'd like to see.

So I would personally prefer for this code not to go in at all. But if
it does go in (and Steven has already agreed to this), it needs a
giant disclaimer that it is best effort and may get broken
inadvertently by changes that may

Re: [PATCH v15 05/14] x86/sgx: Implement basic EPC misc cgroup functionality

2024-06-18 Thread Huang, Kai


> @@ -921,7 +956,8 @@ static int __init sgx_init(void)
>   if (!sgx_page_cache_init())
>   return -ENOMEM;
>  
> - if (!sgx_page_reclaimer_init()) {
> + if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) {
> + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
>   ret = -ENOMEM;
>   goto err_page_cache;
>   }

This code change is wrong due to two reasons:

1) If sgx_page_reclaimer_init() was successful, but sgx_cgroup_init()
failed, you actually need to 'goto err_kthread' because the ksgxd() kernel
thread is already created and is running.

2) There are other cases after here that can also result in sgx_init() to
fail completely, e.g., registering sgx_dev_provision mics device.  We need
to reset the capacity to 0 for those cases as well.

AFAICT, you need something like:

diff --git a/arch/x86/kernel/cpu/sgx/main.c
b/arch/x86/kernel/cpu/sgx/main.c
index 27892e57c4ef..46f9c26992a7 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -930,6 +930,10 @@ static int __init sgx_init(void)
if (ret)
goto err_kthread;
 
+   ret = sgx_cgroup_init();
+   if (ret)
+   goto err_provision;
+
/*
 * Always try to initialize the native *and* KVM drivers.
 * The KVM driver is less picky than the native one and
@@ -941,10 +945,12 @@ static int __init sgx_init(void)
ret = sgx_drv_init();
 
if (sgx_vepc_init() && ret)
-   goto err_provision;
+   goto err_cgroup;
 
return 0;
 
+err_cgroup:
+   /* SGX EPC cgroup cleanup */
 err_provision:
misc_deregister(_dev_provision);
 
@@ -952,6 +958,8 @@ static int __init sgx_init(void)
kthread_stop(ksgxd_tsk);
 
 err_page_cache:
+   misc_misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, 0);
+
for (i = 0; i < sgx_nr_epc_sections; i++) {
vfree(sgx_epc_sections[i].pages);
memunmap(sgx_epc_sections[i].virt_addr);


I put the sgx_cgroup_init() before sgx_drv_init() and sgx_vepc_init(),
otherwise you will need sgx_drv_cleanup() and sgx_vepc_cleanup()
respectively when sgx_cgroup_init() fails.

This looks a little bit weird too, though:

Calling misc_misc_cg_set_capacity() to reset capacity to 0 is done at end
of sgx_init() error path, because the "set capacity" part is done in
sgx_epc_cache_init().  

But logically, both "set capacity" and "reset capacity to 0" should be SGX
EPC cgroup operation, so it's more reasonable to do "set capacity" in
sgx_cgroup_init() and do "reset to 0" in the

/* SGX EPC cgroup cleanup */

as shown above.

Eventually, you will need to do EPC cgroup cleanup anyway, e.g., to free
the workqueue, so it's odd to have two places to handle EPC cgroup
cleanup.

I understand the reason "set capacity" part is done in
sgx_page_cache_init() now is because in that function you can easily get
the capacity.  But the fact is @sgx_numa_nodes also tracks EPC size for
each node, so you can also get the total EPC size from @sgx_numa_node in
sgx_cgroup_init() and set capacity there.

In this case, you can put "reset capacity to 0" and "free workqueue"
together as the "SGX EPC cgroup cleanup", which is way more clear IMHO.

Re: [PATCH 1/2] vdpa: support set mac address from vdpa tool

2024-06-18 Thread Michael S. Tsirkin

On Mon, Jun 17, 2024 at 09:44:21AM -0700, Jakub Kicinski wrote:
> On Mon, 17 Jun 2024 12:20:19 -0400 Michael S. Tsirkin wrote:
> > > But the virtio spec doesn't allow setting the MAC...
> > > I'm probably just lost in the conversation but there's hypervisor side
> > > and there is user/VM side, each of them already has an interface to set
> > > the MAC. The MAC doesn't matter, but I want to make sure my mental model
> > > matches reality in case we start duplicating too much..  
> > 
> > An obvious part of provisioning is specifying the config space
> > of the device.
> 
> Agreed, that part is obvious.
> Please go ahead, I don't really care and you clearly don't have time
> to explain.

Thanks!
Just in case Cindy who is working on it is also confused,
here is what I meant:

- an interface to provision a device, including its config
  space, makes sense to me
- default mac address is part of config space, and would thus be covered
- note how this is different from ability to tweak the mac of an existing
  device


-- 
MST

Re: [PATCH v6 0/2] mm/memblock: Add "reserve_mem" to reserved named memory at boot up

2024-06-18 Thread Ard Biesheuvel

On Mon, 17 Jun 2024 at 23:01, Alexander Graf  wrote:
>
> On 17.06.24 22:40, Steven Rostedt wrote:
> > On Mon, 17 Jun 2024 09:07:29 +0200
> > Alexander Graf  wrote:
> >
> >> Hey Steve,
> >>
> >>
> >> I believe we're talking about 2 different things :). Let me rephrase a
> >> bit and make a concrete example.
> >>
> >> Imagine you have passed the "reserve_mem=12M:4096:trace" kernel command
> >> line option. The kernel now comes up and allocates a random chunk of
> >> memory that - by (admittedly good) chance - may be at the same physical
> >> location as before. Let's assume it deemed 0x100 as a good offset.
> > Note, it's not random. It picks from the top of available memory every
> > time. But things can mess with it (see below).
> >
> >> Let's now assume you're running on a UEFI system. There, you always have
> >> non-volatile storage available to you even in the pre-boot phase. That
> >> means the kernel could create a UEFI variable that says "12M:4096:trace
> >> -> 0x100". The pre-boot phase takes all these UEFI variables and
> >> marks them as reserved. When you finally reach your command line parsing
> >> logic for reserve_mem=, you can flip all reservations that were not on
> >> the command line back to normal memory.
> >>
> >> That way you have pretty much guaranteed persistent memory regions, even
> >> with KASLR changing your memory layout across boots.
> >>
> >> The nice thing is that the above is an extension of what you've already
> >> built: Systems with UEFI simply get better guarantees that their regions
> >> persist.
> > This could be an added feature, but it is very architecture specific,
> > and would likely need architecture specific updates.
>
>
> It definitely would be an added feature, yes. But one that allows you to
> ensure persistence a lot more safely :).
>
>
> > Requirement:
> >
> > Need a way to reserve memory that will be at a consistent location for
> > every boot, if the kernel and system are the same. Does not need to work
> > if rebooting to a different kernel, or if the system can change the
> > memory layout between boots.
> >
> > The reserved memory can not be an hard coded address, as the same 
> > kernel /
> > command line needs to run on several different machines. The picked 
> > memory
> > reservation just needs to be the same for a given machine, but may be
>  With KASLR is enabled, doesn't this approach break too often to be
>  reliable enough for the data you want to extract?
> 
>  Picking up the idea above, with a persistent variable we could even make
>  KASLR avoid that reserved pstore region in its search for a viable KASLR
>  offset.
> >>> I think I was hit by it once in all my testing. For our use case, the
> >>> few times it fails to map is not going to affect what we need this for
> >>> at all.
> >> Once is pretty good. Do you know why? Also once out of how many runs? Is
> >> the randomness source not as random as it should be or are the number of
> >> bits for KASLR maybe so few on your target architecture that the odds of
> >> hitting anything become low? Do these same constraints hold true outside
> >> of your testing environment?
> > So I just ran it a hundred times in a loop. I added a patch to print
> > the location of "_text". The loop was this:
> >
> >for i in `seq 100`; do
> >  ssh root@debiantesting-x86-64 "dmesg | grep -e 'text starts' -e 
> > 'mapped boot'  >> text; grub-reboot '1>0'; sleep 0.5; reboot"
> >  sleep 25
> >done
> >
> > It searches dmesg for my added printk as well as the print of were the
> > ring buffer was loaded in physical memory.
> >
> > It takes about 15 seconds to reboot, so I waited 25. The results are
> > attached. I found that it was consistent 76 times, which means 1 out of
> > 4 it's not. Funny enough, it broke whenever it loaded the kernel below
> > 0x1. And then it would be off by a little.
> >
> > It was consistently at:
> >
> >0x27d00
> >
> > And when it failed, it was at 0x27ce0.
> >
> > Note, when I used the e820 tables to do this, I never saw a failure. My
> > assumption is that when it is below 0x1, something else gets
> > allocated causing this to get pushed down.
>
>
> Thinking about it again: What if you run the allocation super early (see
> arch/x86/boot/compressed/kaslr.c:handle_mem_options())?

That code is not used by EFI boot anymore.

In general, I would recommend (and have recommended) against these
kinds of hacks in mainline, because -especially on x86- there is
always someone that turns up with some kind of convoluted use case
that gets broken if we try to change anything in the boot code.

I spent considerable time over the past year making the EFI/x86 boot
code compatible with the new MS imposed requirements on PC boot
firmware (related to secure boot and NX restrictions on memory
mappings). This involved some radical refactoring of the boot
sequence, including the

Re: [PATCH v4 32/35] s390/uaccess: Add KMSAN support to put_user() and get_user()

2024-06-18 Thread Ilya Leoshkevich

On Tue, 2024-06-18 at 11:52 +0200, Alexander Potapenko wrote:
> On Tue, Jun 18, 2024 at 11:40 AM Ilya Leoshkevich 
> wrote:
> > 
> > On Tue, 2024-06-18 at 11:24 +0200, Alexander Potapenko wrote:
> > > On Thu, Jun 13, 2024 at 5:39 PM Ilya Leoshkevich
> > > 
> > > wrote:
> > > > 
> > > > put_user() uses inline assembly with precise constraints, so
> > > > Clang
> > > > is
> > > > in principle capable of instrumenting it automatically.
> > > > Unfortunately,
> > > > one of the constraints contains a dereferenced user pointer,
> > > > and
> > > > Clang
> > > > does not currently distinguish user and kernel pointers.
> > > > Therefore
> > > > KMSAN attempts to access shadow for user pointers, which is not
> > > > a
> > > > right
> > > > thing to do.
> > > > 
> > > > An obvious fix to add __no_sanitize_memory to __put_user_fn()
> > > > does
> > > > not
> > > > work, since it's __always_inline. And __always_inline cannot be
> > > > removed
> > > > due to the __put_user_bad() trick.
> > > > 
> > > > A different obvious fix of using the "a" instead of the "+Q"
> > > > constraint
> > > > degrades the code quality, which is very important here, since
> > > > it's
> > > > a
> > > > hot path.
> > > > 
> > > > Instead, repurpose the __put_user_asm() macro to define
> > > > __put_user_{char,short,int,long}_noinstr() functions and mark
> > > > them
> > > > with
> > > > __no_sanitize_memory. For the non-KMSAN builds make them
> > > > __always_inline in order to keep the generated code quality.
> > > > Also
> > > > define __put_user_{char,short,int,long}() functions, which call
> > > > the
> > > > aforementioned ones and which *are* instrumented, because they
> > > > call
> > > > KMSAN hooks, which may be implemented as macros.
> > > 
> > > I am not really familiar with s390 assembly, but I think you
> > > still
> > > need to call kmsan_copy_to_user() and kmsan_copy_from_user() to
> > > properly initialize the copied data and report infoleaks.
> > > Would it be possible to insert calls to linux/instrumented.h
> > > hooks
> > > into uaccess functions?
> > 
> > Aren't the existing instrument_get_user() / instrument_put_user()
> > calls
> > sufficient?
> 
> Oh, sorry, I overlooked them. Yes, those should be sufficient.
> But you don't include linux/instrumented.h, do you?

No, apparently we get this include from somewhere else by accident.
I will add it in a separate patch.

Re: [PATCH 1/8] riscv: stacktrace: convert arch_stack_walk() to noinstr

2024-06-18 Thread Alexandre Ghiti


Hi Andy,

On 13/06/2024 09:11, Andy Chiu wrote:

arch_stack_walk() is called intensively in function_graph when the
kernel is compiled with CONFIG_TRACE_IRQFLAGS. As a result, the kernel
logs a lot of arch_stack_walk and its sub-functions into the ftrace
buffer. However, these functions should not appear on the trace log
because they are part of the ftrace itself. This patch references what
arm64 does for the smae function. So it further prevent the re-enter
kprobe issue, which is also possible on riscv.

Related-to: commit 0fbcd8abf337 ("arm64: Prohibit instrumentation on 
arch_stack_walk()")
Fixes: 680341382da5 ("riscv: add CALLER_ADDRx support")
Signed-off-by: Andy Chiu 
---
  arch/riscv/kernel/stacktrace.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
index 528ec7cc9a62..0d3f00eb0bae 100644
--- a/arch/riscv/kernel/stacktrace.c
+++ b/arch/riscv/kernel/stacktrace.c
@@ -156,7 +156,7 @@ unsigned long __get_wchan(struct task_struct *task)
return pc;
  }
  
-noinline void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,

+noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry, 
void *cookie,
 struct task_struct *task, struct pt_regs *regs)
  {
walk_stackframe(task, regs, consume_entry, cookie);



Reviewed-by: Alexandre Ghiti 

I'll try to make this go into -fixes, this is in my fixes branch at least.

Thanks,

Alex

Re: [PATCH v4 32/35] s390/uaccess: Add KMSAN support to put_user() and get_user()

2024-06-18 Thread Alexander Potapenko

On Tue, Jun 18, 2024 at 11:40 AM Ilya Leoshkevich  wrote:
>
> On Tue, 2024-06-18 at 11:24 +0200, Alexander Potapenko wrote:
> > On Thu, Jun 13, 2024 at 5:39 PM Ilya Leoshkevich 
> > wrote:
> > >
> > > put_user() uses inline assembly with precise constraints, so Clang
> > > is
> > > in principle capable of instrumenting it automatically.
> > > Unfortunately,
> > > one of the constraints contains a dereferenced user pointer, and
> > > Clang
> > > does not currently distinguish user and kernel pointers. Therefore
> > > KMSAN attempts to access shadow for user pointers, which is not a
> > > right
> > > thing to do.
> > >
> > > An obvious fix to add __no_sanitize_memory to __put_user_fn() does
> > > not
> > > work, since it's __always_inline. And __always_inline cannot be
> > > removed
> > > due to the __put_user_bad() trick.
> > >
> > > A different obvious fix of using the "a" instead of the "+Q"
> > > constraint
> > > degrades the code quality, which is very important here, since it's
> > > a
> > > hot path.
> > >
> > > Instead, repurpose the __put_user_asm() macro to define
> > > __put_user_{char,short,int,long}_noinstr() functions and mark them
> > > with
> > > __no_sanitize_memory. For the non-KMSAN builds make them
> > > __always_inline in order to keep the generated code quality. Also
> > > define __put_user_{char,short,int,long}() functions, which call the
> > > aforementioned ones and which *are* instrumented, because they call
> > > KMSAN hooks, which may be implemented as macros.
> >
> > I am not really familiar with s390 assembly, but I think you still
> > need to call kmsan_copy_to_user() and kmsan_copy_from_user() to
> > properly initialize the copied data and report infoleaks.
> > Would it be possible to insert calls to linux/instrumented.h hooks
> > into uaccess functions?
>
> Aren't the existing instrument_get_user() / instrument_put_user() calls
> sufficient?

Oh, sorry, I overlooked them. Yes, those should be sufficient.
But you don't include linux/instrumented.h, do you?

Re: [PATCH v4 32/35] s390/uaccess: Add KMSAN support to put_user() and get_user()

2024-06-18 Thread Ilya Leoshkevich

On Tue, 2024-06-18 at 11:24 +0200, Alexander Potapenko wrote:
> On Thu, Jun 13, 2024 at 5:39 PM Ilya Leoshkevich 
> wrote:
> > 
> > put_user() uses inline assembly with precise constraints, so Clang
> > is
> > in principle capable of instrumenting it automatically.
> > Unfortunately,
> > one of the constraints contains a dereferenced user pointer, and
> > Clang
> > does not currently distinguish user and kernel pointers. Therefore
> > KMSAN attempts to access shadow for user pointers, which is not a
> > right
> > thing to do.
> > 
> > An obvious fix to add __no_sanitize_memory to __put_user_fn() does
> > not
> > work, since it's __always_inline. And __always_inline cannot be
> > removed
> > due to the __put_user_bad() trick.
> > 
> > A different obvious fix of using the "a" instead of the "+Q"
> > constraint
> > degrades the code quality, which is very important here, since it's
> > a
> > hot path.
> > 
> > Instead, repurpose the __put_user_asm() macro to define
> > __put_user_{char,short,int,long}_noinstr() functions and mark them
> > with
> > __no_sanitize_memory. For the non-KMSAN builds make them
> > __always_inline in order to keep the generated code quality. Also
> > define __put_user_{char,short,int,long}() functions, which call the
> > aforementioned ones and which *are* instrumented, because they call
> > KMSAN hooks, which may be implemented as macros.
> 
> I am not really familiar with s390 assembly, but I think you still
> need to call kmsan_copy_to_user() and kmsan_copy_from_user() to
> properly initialize the copied data and report infoleaks.
> Would it be possible to insert calls to linux/instrumented.h hooks
> into uaccess functions?

Aren't the existing instrument_get_user() / instrument_put_user() calls
sufficient?

[PATCH] filemap: add trace events for get_pages, map_pages, and fault

2024-06-18 Thread Takaya Saeki

To allow precise tracking of page caches accessed, add new tracepoints
that trigger when a process actually accesses them.

The ureadahead program used by ChromeOS traces the disk access of
programs as they start up at boot up. It uses mincore(2) or the
'mm_filemap_add_to_page_cache' trace event to accomplish this. It stores
this information in a "pack" file and on subsequent boots, it will read
the pack file and call readahead(2) on the information so that disk
storage can be loaded into RAM before the applications actually need it.

A problem we see is that due to the kernel's readahead algorithm that
can aggressively pull in more data than needed (to try and accomplish
the same goal) and this data is also recorded. The end result is that
the pack file contains a lot of pages on disk that are never actually
used. Calling readahead(2) on these unused pages can slow down the
system boot up times.

To solve this, add 3 new trace events, get_pages, map_pages, and fault.
These will be used to trace the pages are not only pulled in from disk,
but are actually used by the application. Only those pages will be
stored in the pack file, and this helps out the performance of boot up.

With the combination of these 3 new trace events and
mm_filemap_add_to_page_cache, we observed a reduction in the pack file
by 7.3% - 20% on ChromeOS varying by device.

Signed-off-by: Takaya Saeki 
---
 include/trace/events/filemap.h | 84 ++
 mm/filemap.c   |  4 ++
 2 files changed, 88 insertions(+)

diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index 46c89c1e460c..68c705572c4f 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -56,6 +56,90 @@ DEFINE_EVENT(mm_filemap_op_page_cache, 
mm_filemap_add_to_page_cache,
TP_ARGS(folio)
);
 
+DECLARE_EVENT_CLASS(mm_filemap_op_page_cache_range,
+
+   TP_PROTO(
+   struct address_space *mapping,
+   pgoff_t index,
+   pgoff_t last_index
+   ),
+
+   TP_ARGS(mapping, index, last_index),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, i_ino)
+   __field(dev_t, s_dev)
+   __field(unsigned long, index)
+   __field(unsigned long, last_index)
+   ),
+
+   TP_fast_assign(
+   __entry->i_ino = mapping->host->i_ino;
+   if (mapping->host->i_sb)
+   __entry->s_dev =
+   mapping->host->i_sb->s_dev;
+   else
+   __entry->s_dev = mapping->host->i_rdev;
+   __entry->index = index;
+   __entry->last_index = last_index;
+   ),
+
+   TP_printk(
+   "dev %d:%d ino %lx ofs=%lu max_ofs=%lu",
+   MAJOR(__entry->s_dev),
+   MINOR(__entry->s_dev), __entry->i_ino,
+   __entry->index << PAGE_SHIFT,
+   __entry->last_index << PAGE_SHIFT
+   )
+);
+
+DEFINE_EVENT(mm_filemap_op_page_cache_range, mm_filemap_get_pages,
+   TP_PROTO(
+   struct address_space *mapping,
+   pgoff_t index,
+   pgoff_t last_index
+   ),
+   TP_ARGS(mapping, index, last_index)
+);
+
+DEFINE_EVENT(mm_filemap_op_page_cache_range, mm_filemap_map_pages,
+   TP_PROTO(
+   struct address_space *mapping,
+   pgoff_t index,
+   pgoff_t last_index
+   ),
+   TP_ARGS(mapping, index, last_index)
+);
+
+TRACE_EVENT(mm_filemap_fault,
+   TP_PROTO(struct address_space *mapping, pgoff_t index),
+
+   TP_ARGS(mapping, index),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, i_ino)
+   __field(dev_t, s_dev)
+   __field(unsigned long, index)
+   ),
+
+   TP_fast_assign(
+   __entry->i_ino = mapping->host->i_ino;
+   if (mapping->host->i_sb)
+   __entry->s_dev =
+   mapping->host->i_sb->s_dev;
+   else
+   __entry->s_dev = mapping->host->i_rdev;
+   __entry->index = index;
+   ),
+
+   TP_printk(
+   "dev %d:%d ino %lx ofs=%lu",
+   MAJOR(__entry->s_dev),
+   MINOR(__entry->s_dev), __entry->i_ino,
+   __entry->index << PAGE_SHIFT
+   )
+);
+
 TRACE_EVENT(filemap_set_wb_err,
TP_PROTO(struct address_space *mapping, errseq_t eseq),
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 876cc64aadd7..39f9d7fb3d2c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2556,6 +2556,7 @@ static int filemap_get_pages(struct kiocb *iocb, size_t 
count,
goto err;
}
 
+   trace_mm_filemap_get_pages(mapping, index, last_index);
return 0;
 err:
if (err < 0)
@@ -3286,6 +3287,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
if (unlikely(index >= max_idx))

Re: [PATCH v4 26/35] s390/diag: Unpoison diag224() output buffer

2024-06-18 Thread Alexander Potapenko

On Thu, Jun 13, 2024 at 5:40 PM Ilya Leoshkevich  wrote:
>
> Diagnose 224 stores 4k bytes, which currently cannot be deduced from
> the inline assembly constraints. This leads to KMSAN false positives.
>
> Fix the constraints by using a 4k-sized struct instead of a raw
> pointer. While at it, prettify them too.
>
> Suggested-by: Heiko Carstens 
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko

Re: [PATCH v4 32/35] s390/uaccess: Add KMSAN support to put_user() and get_user()

2024-06-18 Thread Alexander Potapenko

On Thu, Jun 13, 2024 at 5:39 PM Ilya Leoshkevich  wrote:
>
> put_user() uses inline assembly with precise constraints, so Clang is
> in principle capable of instrumenting it automatically. Unfortunately,
> one of the constraints contains a dereferenced user pointer, and Clang
> does not currently distinguish user and kernel pointers. Therefore
> KMSAN attempts to access shadow for user pointers, which is not a right
> thing to do.
>
> An obvious fix to add __no_sanitize_memory to __put_user_fn() does not
> work, since it's __always_inline. And __always_inline cannot be removed
> due to the __put_user_bad() trick.
>
> A different obvious fix of using the "a" instead of the "+Q" constraint
> degrades the code quality, which is very important here, since it's a
> hot path.
>
> Instead, repurpose the __put_user_asm() macro to define
> __put_user_{char,short,int,long}_noinstr() functions and mark them with
> __no_sanitize_memory. For the non-KMSAN builds make them
> __always_inline in order to keep the generated code quality. Also
> define __put_user_{char,short,int,long}() functions, which call the
> aforementioned ones and which *are* instrumented, because they call
> KMSAN hooks, which may be implemented as macros.

I am not really familiar with s390 assembly, but I think you still
need to call kmsan_copy_to_user() and kmsan_copy_from_user() to
properly initialize the copied data and report infoleaks.
Would it be possible to insert calls to linux/instrumented.h hooks
into uaccess functions?

Re: [PATCH] tools: build: use correct lib name for libtracefs feature detection

2024-06-18 Thread Daniel Bristot de Oliveira

Adding Arnaldo

On 6/17/24 20:38, Daniel Wagner wrote:
> Use libtracefs as package name to lookup the CFLAGS for libtracefs. This
> makes it possible to use the distro specific path as include path for
> the header file.
> 
> Signed-off-by: Daniel Wagner 
> ---
> Our downstream packaging stop working. After a bit of didding I found out that
> the libtracefs feature detection is not completely right.
> 
> https://build.opensuse.org/build/benchmark/SLE_15_SP5/x86_64/rtla/_log
> ---
>  tools/build/feature/Makefile  | 2 +-
>  tools/build/feature/test-libtracefs.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
> index ed54cef450f5..489cbed7e82a 100644
> --- a/tools/build/feature/Makefile
> +++ b/tools/build/feature/Makefile
> @@ -213,7 +213,7 @@ endif
>   $(BUILD) -ltraceevent
>  
>  $(OUTPUT)test-libtracefs.bin:
> -  $(BUILD) $(shell $(PKG_CONFIG) --cflags libtraceevent 2>/dev/null) 
> -ltracefs
> +  $(BUILD) $(shell $(PKG_CONFIG) --cflags libtracefs 2>/dev/null) 
> -ltracefs
>  
>  $(OUTPUT)test-libcrypto.bin:
>   $(BUILD) -lcrypto
> diff --git a/tools/build/feature/test-libtracefs.c 
> b/tools/build/feature/test-libtracefs.c
> index 8eff16c0c10b..29a757a7d848 100644
> --- a/tools/build/feature/test-libtracefs.c
> +++ b/tools/build/feature/test-libtracefs.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0
> -#include 
> +#include 
>  
>  int main(void)
>  {
> 
> ---
> base-commit: 0bbac3facb5d6cc0171c45c9873a2dc96bea9680
> change-id: 20240617-rtla-build-83020baf9277
> 
> Best regards,

Re: [PATCH v5 3/4] net: dropreason: use new __print_sym() in tracing

2024-06-18 Thread Paolo Abeni

On Fri, 2024-06-14 at 10:19 +0200, Johannes Berg wrote:
> From: Johannes Berg 
> 
> The __print_symbolic() could only ever print the core
> drop reasons, since that's the way the infrastructure
> works. Now that we have __print_sym() with all the
> advantages mentioned in that commit, convert to that
> and get all the drop reasons from all subsystems. As
> we already have a list of them, that's really easy.
> 
> This is a little bit of .text (~100 bytes in my build)
> and saves a lot of .data (~17k).
> 
> Signed-off-by: Johannes Berg 
> ---
>  include/net/dropreason.h   |  5 +
>  include/trace/events/skb.h | 16 +++---
>  net/core/skbuff.c  | 43 ++
>  3 files changed, 51 insertions(+), 13 deletions(-)
> 
> diff --git a/include/net/dropreason.h b/include/net/dropreason.h
> index 56cb7be92244..c157070b5303 100644
> --- a/include/net/dropreason.h
> +++ b/include/net/dropreason.h
> @@ -42,6 +42,11 @@ struct drop_reason_list {
>  extern const struct drop_reason_list __rcu *
>  drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM];
>  
> +#ifdef CONFIG_TRACEPOINTS
> +const char *drop_reason_lookup(unsigned long long value);
> +void drop_reason_show(struct seq_file *m);
> +#endif
> +
>  void drop_reasons_register_subsys(enum skb_drop_reason_subsys subsys,
> const struct drop_reason_list *list);
>  void drop_reasons_unregister_subsys(enum skb_drop_reason_subsys subsys);
> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> index 07e0715628ec..8a1a63f9e796 100644
> --- a/include/trace/events/skb.h
> +++ b/include/trace/events/skb.h
> @@ -8,15 +8,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
> -#undef FN
> -#define FN(reason)   TRACE_DEFINE_ENUM(SKB_DROP_REASON_##reason);
> -DEFINE_DROP_REASON(FN, FN)
> -
> -#undef FN
> -#undef FNe
> -#define FN(reason)   { SKB_DROP_REASON_##reason, #reason },
> -#define FNe(reason)  { SKB_DROP_REASON_##reason, #reason }
> +TRACE_DEFINE_SYM_FNS(drop_reason, drop_reason_lookup, drop_reason_show);
>  
>  /*
>   * Tracepoint for free an sk_buff:
> @@ -44,13 +38,9 @@ TRACE_EVENT(kfree_skb,
>  
>   TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
> __entry->skbaddr, __entry->protocol, __entry->location,
> -   __print_symbolic(__entry->reason,
> -DEFINE_DROP_REASON(FN, FNe)))
> +   __print_sym(__entry->reason, drop_reason ))

Minor nit: if you have to repost for other reasons,   ^^ here
checkpatch complains for the extra space.

Otherwise LGTM,

Thanks!

Paolo

Re: [PATCH v2] x86/paravirt: Disable virt spinlock on bare metal

2024-06-18 Thread Nikolay Borisov





On 26.05.24 г. 4:58 ч., Chen Yu wrote:

The kernel can change spinlock behavior when running as a guest. But
this guest-friendly behavior causes performance problems on bare metal.
So there's a 'virt_spin_lock_key' static key to switch between the two
modes.

The static key is always enabled by default (run in guest mode) and
should be disabled for bare metal (and in some guests that want native
behavior).

Performance drop is reported when running encode/decode workload and
BenchSEE cache sub-workload.
Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused
native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS
is disabled the virt_spin_lock_key is incorrectly set to true on bare
metal. The qspinlock degenerates to test-and-set spinlock, which
decrease the performance on bare metal.

Fix this by disabling virt_spin_lock_key if it is on bare metal,
regardless of CONFIG_PARAVIRT_SPINLOCKS.



nit:

This bug wouldn't have happened if the key was defined FALSE by default 
and only enabled in the appropriate case. I think it makes more sense to 
invert the logic and have the key FALSE by default and only enable it 
iff the kernel is running under a hypervisor... At worst only the 
virtualization case would suffer if the lock is falsely not enabled.

[PATCH] fgraph: Use str_plural() in test_graph_storage_single()

2024-06-18 Thread Jiapeng Chong

Use existing str_plural() function rather than duplicating its
implementation.

./kernel/trace/trace_selftest.c:880:56-60: opportunity for str_plural(size).

Reported-by: Abaci Robot 
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9349
Signed-off-by: Jiapeng Chong 
---
 kernel/trace/trace_selftest.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index adf0f436d84b..97f1e4bc47dc 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -877,7 +877,7 @@ static int __init test_graph_storage_single(struct 
fgraph_fixture *fixture)
int ret;
 
pr_cont("PASSED\n");
-   pr_info("Testing fgraph storage of %d byte%s: ", size, size > 1 ? "s" : 
"");
+   pr_info("Testing fgraph storage of %d byte%s: ", size, 
str_plural(size));
 
ret = init_fgraph_fixture(fixture);
if (ret && ret != -ENODEV) {
-- 
2.20.1.7.g153144c

Re: [PATCH net-next v5 7/7] af_packet: use sk_skb_reason_drop to free rx packets

2024-06-18 Thread Jesper Dangaard Brouer




On 17/06/2024 20.09, Yan Zhai wrote:

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot
Closes:https://lore.kernel.org/r/202406011859.aacus8gv-...@intel.com/
Signed-off-by: Yan Zhai
---
v2->v3: fixed uninitialized sk, added missing report tags.
---
  net/packet/af_packet.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)


Acked-by: Jesper Dangaard Brouer

Re: [PATCH net-next v5 6/7] udp: use sk_skb_reason_drop to free rx packets

2024-06-18 Thread Jesper Dangaard Brouer




On 17/06/2024 20.09, Yan Zhai wrote:

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot
Closes:https://lore.kernel.org/r/202406011751.npvn0ssk-...@intel.com/
Signed-off-by: Yan Zhai
---
v2->v3: added missing report tags
---
  net/ipv4/udp.c | 10 +-
  net/ipv6/udp.c | 10 +-
  2 files changed, 10 insertions(+), 10 deletions(-)


Acked-by: Jesper Dangaard Brouer

Re: [PATCH net-next v5 5/7] tcp: use sk_skb_reason_drop to free rx packets

2024-06-18 Thread Jesper Dangaard Brouer




On 17/06/2024 20.09, Yan Zhai wrote:

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot
Closes:https://lore.kernel.org/r/202406011539.jhwbd7dx-...@intel.com/
Signed-off-by: Yan Zhai
---
v2->v3: added missing report tags
---
  net/ipv4/syncookies.c | 2 +-
  net/ipv4/tcp_input.c  | 2 +-
  net/ipv4/tcp_ipv4.c   | 6 +++---
  net/ipv6/syncookies.c | 2 +-
  net/ipv6/tcp_ipv6.c   | 6 +++---
  5 files changed, 9 insertions(+), 9 deletions(-)


Acked-by: Jesper Dangaard Brouer

Re: [PATCH net-next v5 4/7] net: raw: use sk_skb_reason_drop to free rx packets

2024-06-18 Thread Jesper Dangaard Brouer




On 17/06/2024 20.09, Yan Zhai wrote:

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Signed-off-by: Yan Zhai
---
  net/ipv4/raw.c | 4 ++--
  net/ipv6/raw.c | 8 
  2 files changed, 6 insertions(+), 6 deletions(-)



Acked-by: Jesper Dangaard Brouer

Re: [PATCH net-next v5 3/7] ping: use sk_skb_reason_drop to free rx packets

2024-06-18 Thread Jesper Dangaard Brouer




On 17/06/2024 20.09, Yan Zhai wrote:

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Signed-off-by: Yan Zhai
---
  net/ipv4/ping.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Acked-by: Jesper Dangaard Brouer

Re: [PATCH net-next v5 2/7] net: introduce sk_skb_reason_drop function

2024-06-18 Thread Jesper Dangaard Brouer





On 17/06/2024 20.09, Yan Zhai wrote:

Long used destructors kfree_skb and kfree_skb_reason do not pass
receiving socket to packet drop tracepoints trace_kfree_skb.
This makes it hard to track packet drops of a certain netns (container)
or a socket (user application).

The naming of these destructors are also not consistent with most sk/skb
operating functions, i.e. functions named "sk_xxx" or "skb_xxx".
Introduce a new functions sk_skb_reason_drop as drop-in replacement for
kfree_skb_reason on local receiving path. Callers can now pass receiving
sockets to the tracepoints.

kfree_skb and kfree_skb_reason are still usable but they are now just
inline helpers that call sk_skb_reason_drop.

Note it is not feasible to do the same to consume_skb. Packets not
dropped can flow through multiple receive handlers, and have multiple
receiving sockets. Leave it untouched for now.

Suggested-by: Eric Dumazet
Signed-off-by: Yan Zhai
---
v1->v2: changes function names to be more consistent with common sk/skb
operations
---
  include/linux/skbuff.h | 10 --
  net/core/skbuff.c  | 22 --
  2 files changed, 20 insertions(+), 12 deletions(-)


Acked-by: Jesper Dangaard Brouer

1 2 >

1 - 100 of 101 matches

Mail list logo