On 6/5/25 8:07 AM, Sebastian Andrzej Siewior wrote:
> The per-CPU data section is handled differently than the other sections.
> The memory allocations requires a special __percpu pointer and then the
> section is copied into the view of each CPU. Therefore the SHF_ALLOC
> flag is removed to ensure move_module() skips it.
> 
> Later, relocations are applied and apply_relocations() skips sections
> without SHF_ALLOC because they have not been copied. This also skips the
> per-CPU data section.
> The missing relocations result in a NULL pointer on x86-64 and very
> small values on x86-32. This results in a crash because it is not
> skipped like NULL pointer would and can't be dereferenced.
> 
> Such an assignment happens during static per-CPU lock initialisation
> with lockdep enabled.
> 
> Add the SHF_ALLOC flag back for the per-CPU section (if found) after
> move_module().
> 
> Reported-by: kernel test robot <oliver.s...@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-...@intel.com
> Fixes: 8d8022e8aba85 ("module: do percpu allocation after uniqueness check.  
> No, really!")

Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
(pre-Git, [1])?

> Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
> ---
> v1…v2: https://lore.kernel.org/all/20250604152707.cied9...@linutronix.de/
>   - Add the flag back only on SMP if the per-CPU section was found.
> 
>  kernel/module/main.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 5c6ab20240a6d..4f6554dedf8ea 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct 
> load_info *info, int flags)
>       if (err)
>               return ERR_PTR(err);
>  
> +     /* Add SHF_ALLOC back so that relocations are applied. */
> +     if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
> +             info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
> +
>       /* Module has been copied to its final place now: return it. */
>       mod = (void *)info->sechdrs[info->index.mod].sh_addr;
>       kmemleak_load_module(mod, info);

This looks like a valid fix. The info->sechdrs[info->index.pcpu].sh_addr
is set by rewrite_section_headers() to point to the percpu data in the
userspace-passed ELF copy. The section has SHF_ALLOC reset, so it
doesn't move and the sh_addr isn't adjusted by move_module(). The
function apply_relocations() then applies the relocations in the initial
ELF copy. Finally, post_relocation() copies the relocated percpu data to
their final per-CPU destinations.

However, I'm not sure if it is best to manipulate the SHF_ALLOC flag in
this way. It is ok to reset it once, but if we need to set it back again
then I would reconsider this.

An alternative approach could be to teach apply_relocations() that the
percpu section is special and should be relocated even though it doesn't
have SHF_ALLOC set. This would also allow adding a comment explaining
that we're relocating the data in the original ELF copy, which I find
useful to mention as it is different to other relocation processing.

For instance:

        /*
         * Don't bother with non-allocated sections.
         *
         * An exception is the percpu section, which has separate allocations
         * for individual CPUs. We relocate the percpu section in the initial
         * ELF template and subsequently copy it to the per-CPU destinations.
         */
        if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
            infosec != info->index.pcpu)
                continue;

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=b3b91325f3c77ace041f769ada7039ebc7aab8de

-- 
Thanks,
Petr

Reply via email to