On Wed, Dec 18, 2024 at 08:33:09PM +0100, Magnus Lindholm wrote:
> Hi again,
>
> So, various versions of GCC can trigger/untrigger the RCU bugs that
> I've been hitting. This may of course still be due to GCC miscompiling
> some code on alpha but that said I've taken another look at the code
> involved when this is triggered. From what I've seen so far, this
> happens when kernel threads are used in the code to access structured
> data stored on the stack (structs). The quote below is from a comment
> in the kernel code (include/linux/percpu-defs.h)
>
> "s390 and alpha modules require percpu variables to be defined as weak
> to force the compiler to generate GOT based external references for
> them. This is necessary because percpu sections will be located
> outside of the usually addressable area. This definition puts the
> following two extra restrictions when defining percpu variables.
>
> 1. The symbol must be globally unique, even the static ones.
>
> 2. Static percpu variables cannot be defined inside a function."
>
>
> Taking these notes into account I've made some minor modifications to
> the code (briefly described below). The modifications involve
> declaring some structs previously placed on stack as
> DEFINE_PER_CPU_SHARED_ALIGNED. This is already done for some strucs in
> rcu/smp code. Making these modifications fixes the problem and I can
> build the kernel with GCC versions that previously triggered the bug.
> The same approach fixed both the network interface-rename bug as well
> as the scsi module unload bug.
>
> in kernel/smp/smp.c
> -------------------
> #if defined(ARCH_NEEDS_WEAK_PER_CPU)
> smp_call_function_single(...) use csd = this_cpu_ptr(&csd_data)
> regardless if its called with wait = 0 or 1. Make sure to declare
> csd_data as "DEFINE_PER_CPU_SHARED_ALIGNED"
> #endif
If I understand your proposed changes correctly, this could increase
the amount of time CPUs spend waiting for for the csd_data to become free.
This would be a very unwelcome development in some quarters.
> in kernel/rcu/tree.c
> --------------------
> #use rew_data instead of stack allocated struct
> static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_exp_work, rew_data);
I am not finding this one.
> in kernel/rcu/tree_exp.h
> ------------------------
> #if defined(ARCH_NEEDS_WEAK_PER_CPU)
> void synchronize_rcu_expedited(void)
>
> in stead of:
> struct rcu_exp_work rew;
>
> do:
> #if defined(ARCH_NEEDS_WEAK_PER_CPU)
> struct rcu_exp_work *rewp;
> rewp=this_cpu_ptr(&rew_data);
>
> rew_data is declared "DEFINE_PER_CPU_SHARED_ALIGNED"
>
> #endif
OK, I *can* find this one, thank you. At least assuming you mean the
local variable "rew" defined in synchronize_rcu_expedited().
I don't see how this relates to the per-CPU variable restrictions called
out above relates here. We do not have any sort of per-CPU variable here,
but instead a simple structure allocated on the stack. And a relatively
small structure at that.
In addition, making this be a per-CPU variable will break if a given
CPU invokes a second synchronize_rcu_expedited() before its first call
to synchronize_rcu_expedited() returns. You will end up with the second
call clobbering that per-CPU variable, which will likely fatally confuse
the workqueue when attempting to manage the resulting workqueue handlers.
You might need heavy load to make this happen, but I don't see anything
preventing it from happening.
So what am I missing here?
Thanx, Paul