Re: [PATCH 1/2] srcu: Fix broken node geometry after early ssp init

2021-04-19 Thread Paul E. McKenney
On Mon, Apr 19, 2021 at 02:23:45AM +0200, Frederic Weisbecker wrote:
> On Sat, Apr 17, 2021 at 09:46:16PM -0700, Paul E. McKenney wrote:
> > On Sat, Apr 17, 2021 at 03:16:49PM +0200, Frederic Weisbecker wrote:
> > > On Wed, Apr 14, 2021 at 08:55:38AM -0700, Paul E. McKenney wrote:
> > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > > index 75ed367d5b60..24db97cbf76b 100644
> > > > > --- a/kernel/rcu/rcu.h
> > > > > +++ b/kernel/rcu/rcu.h
> > > > > @@ -278,6 +278,7 @@ extern void resched_cpu(int cpu);
> > > > >  extern int rcu_num_lvls;
> > > > >  extern int num_rcu_lvl[];
> > > > >  extern int rcu_num_nodes;
> > > > > +extern bool rcu_geometry_initialized;
> > > > 
> > > > Can this be a static local variable inside rcu_init_geometry()?
> > > > 
> > > > After all, init_srcu_struct() isn't called all that often, and its 
> > > > overhead
> > > > is such that an extra function call and check is going to hurt it.  This
> > > > of course requires removing __init from rcu_init_geometry(), but it is 
> > > > not
> > > > all that large, so why not just remove the __init?
> > > > 
> > > > But if we really are worried about reclaiming rcu_init_geometry()'s
> > > > instructions (maybe we are?), then rcu_init_geometry() can be split
> > > > into a function that just does the check (which is not __init) and the
> > > > remainder of the function, which could remain __init.
> > > 
> > > There you go:
> > 
> > Queued, thank you!
> 
> Thanks!
> 
> And please also consider "[PATCH 2/2] srcu: Early test SRCU polling start"
> if you want to expand testing coverage to polling.

Ah, thank you for the reminder!  Queued and pushed.

Thanx, Paul


Re: [PATCH 1/2] srcu: Fix broken node geometry after early ssp init

2021-04-18 Thread Frederic Weisbecker
On Sat, Apr 17, 2021 at 09:46:16PM -0700, Paul E. McKenney wrote:
> On Sat, Apr 17, 2021 at 03:16:49PM +0200, Frederic Weisbecker wrote:
> > On Wed, Apr 14, 2021 at 08:55:38AM -0700, Paul E. McKenney wrote:
> > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > index 75ed367d5b60..24db97cbf76b 100644
> > > > --- a/kernel/rcu/rcu.h
> > > > +++ b/kernel/rcu/rcu.h
> > > > @@ -278,6 +278,7 @@ extern void resched_cpu(int cpu);
> > > >  extern int rcu_num_lvls;
> > > >  extern int num_rcu_lvl[];
> > > >  extern int rcu_num_nodes;
> > > > +extern bool rcu_geometry_initialized;
> > > 
> > > Can this be a static local variable inside rcu_init_geometry()?
> > > 
> > > After all, init_srcu_struct() isn't called all that often, and its 
> > > overhead
> > > is such that an extra function call and check is going to hurt it.  This
> > > of course requires removing __init from rcu_init_geometry(), but it is not
> > > all that large, so why not just remove the __init?
> > > 
> > > But if we really are worried about reclaiming rcu_init_geometry()'s
> > > instructions (maybe we are?), then rcu_init_geometry() can be split
> > > into a function that just does the check (which is not __init) and the
> > > remainder of the function, which could remain __init.
> > 
> > There you go:
> 
> Queued, thank you!

Thanks!

And please also consider "[PATCH 2/2] srcu: Early test SRCU polling start"
if you want to expand testing coverage to polling.


Re: [PATCH 1/2] srcu: Fix broken node geometry after early ssp init

2021-04-17 Thread Paul E. McKenney
On Sat, Apr 17, 2021 at 03:16:49PM +0200, Frederic Weisbecker wrote:
> On Wed, Apr 14, 2021 at 08:55:38AM -0700, Paul E. McKenney wrote:
> > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > index 75ed367d5b60..24db97cbf76b 100644
> > > --- a/kernel/rcu/rcu.h
> > > +++ b/kernel/rcu/rcu.h
> > > @@ -278,6 +278,7 @@ extern void resched_cpu(int cpu);
> > >  extern int rcu_num_lvls;
> > >  extern int num_rcu_lvl[];
> > >  extern int rcu_num_nodes;
> > > +extern bool rcu_geometry_initialized;
> > 
> > Can this be a static local variable inside rcu_init_geometry()?
> > 
> > After all, init_srcu_struct() isn't called all that often, and its overhead
> > is such that an extra function call and check is going to hurt it.  This
> > of course requires removing __init from rcu_init_geometry(), but it is not
> > all that large, so why not just remove the __init?
> > 
> > But if we really are worried about reclaiming rcu_init_geometry()'s
> > instructions (maybe we are?), then rcu_init_geometry() can be split
> > into a function that just does the check (which is not __init) and the
> > remainder of the function, which could remain __init.
> 
> There you go:

Queued, thank you!

Thanx, Paul

> ---
> From: Frederic Weisbecker 
> Date: Wed, 31 Mar 2021 16:10:36 +0200
> Subject: [PATCH] srcu: Fix broken node geometry after early ssp init
> 
> An ssp initialized before rcu_init_geometry() will have its snp hierarchy
> based on CONFIG_NR_CPUS.
> 
> Once rcu_init_geometry() is called, the nodes distribution is shrinked
> and optimized toward meeting the actual possible number of CPUs detected
> on boot.
> 
> Later on, the ssp that was initialized before rcu_init_geometry() is
> confused and sometimes refers to its initial CONFIG_NR_CPUS based node
> hierarchy, sometimes to the new num_possible_cpus() based one instead.
> For example each of its sdp->mynode remain backward and refer to the
> early node leaves that may not exist anymore. On the other hand the
> srcu_for_each_node_breadth_first() refers to the new node hierarchy.
> 
> There are at least two bad possible outcomes to this:
> 
> 1) a) A callback enqueued early on an sdp is recorded pending on
>   sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with
>   sdp->mynode pointing to a deep leaf (say 3 levels).
> 
>b) The grace period ends after rcu_init_geometry() which shrinks the
>   nodes level to a single one. srcu_gp_end() walks through the new
>   snp hierarchy without ever reaching the old leaves so the callback
>   is never executed.
> 
>This is easily reproduced on an 8 CPUs machine with
>CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The
>srcu_barrier() after early tests verification never completes and
>the boot hangs:
> 
>   [ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 
> seconds.
>   [ 5413.147564]   Not tainted 5.12.0-rc4+ #28
>   [ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
>   [ 5413.159753] task:swapper/0   state:D stack:0 pid:1 ppid: 
> 0 flags:0x4000
>   [ 5413.168099] Call Trace:
>   [ 5413.170555]  __schedule+0x36c/0x930
>   [ 5413.174057]  ? wait_for_completion+0x88/0x110
>   [ 5413.178423]  schedule+0x46/0xf0
>   [ 5413.181575]  schedule_timeout+0x284/0x380
>   [ 5413.185591]  ? wait_for_completion+0x88/0x110
>   [ 5413.189957]  ? mark_held_locks+0x61/0x80
>   [ 5413.193882]  ? mark_held_locks+0x61/0x80
>   [ 5413.197809]  ? _raw_spin_unlock_irq+0x24/0x50
>   [ 5413.202173]  ? wait_for_completion+0x88/0x110
>   [ 5413.206535]  wait_for_completion+0xb4/0x110
>   [ 5413.210724]  ? srcu_torture_stats_print+0x110/0x110
>   [ 5413.215610]  srcu_barrier+0x187/0x200
>   [ 5413.219277]  ? rcu_tasks_verify_self_tests+0x50/0x50
>   [ 5413.224244]  ? rdinit_setup+0x2b/0x2b
>   [ 5413.227907]  rcu_verify_early_boot_tests+0x2d/0x40
>   [ 5413.232700]  do_one_initcall+0x63/0x310
>   [ 5413.236541]  ? rdinit_setup+0x2b/0x2b
>   [ 5413.240207]  ? rcu_read_lock_sched_held+0x52/0x80
>   [ 5413.244912]  kernel_init_freeable+0x253/0x28f
>   [ 5413.249273]  ? rest_init+0x250/0x250
>   [ 5413.252846]  kernel_init+0xa/0x110
>   [ 5413.256257]  ret_from_fork+0x22/0x30
> 
> 2) An ssp that gets initialized before rcu_init_geometry() and used
>afterward will always have stale rdp->mynode references, resulting in
>callbacks to be missed in srcu_gp_end(), just like in the previous
>scenario.
> 
> Solve this with initializing nodes geometry whenever an struct srcu_state
> happens to be initialized before rcu_init(). This way we make sure the
> RCU nodes hierarchy is properly built and distributed before the nodes
> of an struct srcu_state are allocated.
> 
> Suggested-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 
> Cc: Boqun Feng 
> Cc: Lai 

Re: [PATCH 1/2] srcu: Fix broken node geometry after early ssp init

2021-04-17 Thread Frederic Weisbecker
On Wed, Apr 14, 2021 at 08:55:38AM -0700, Paul E. McKenney wrote:
> > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > index 75ed367d5b60..24db97cbf76b 100644
> > --- a/kernel/rcu/rcu.h
> > +++ b/kernel/rcu/rcu.h
> > @@ -278,6 +278,7 @@ extern void resched_cpu(int cpu);
> >  extern int rcu_num_lvls;
> >  extern int num_rcu_lvl[];
> >  extern int rcu_num_nodes;
> > +extern bool rcu_geometry_initialized;
> 
> Can this be a static local variable inside rcu_init_geometry()?
> 
> After all, init_srcu_struct() isn't called all that often, and its overhead
> is such that an extra function call and check is going to hurt it.  This
> of course requires removing __init from rcu_init_geometry(), but it is not
> all that large, so why not just remove the __init?
> 
> But if we really are worried about reclaiming rcu_init_geometry()'s
> instructions (maybe we are?), then rcu_init_geometry() can be split
> into a function that just does the check (which is not __init) and the
> remainder of the function, which could remain __init.

There you go:

---
From: Frederic Weisbecker 
Date: Wed, 31 Mar 2021 16:10:36 +0200
Subject: [PATCH] srcu: Fix broken node geometry after early ssp init

An ssp initialized before rcu_init_geometry() will have its snp hierarchy
based on CONFIG_NR_CPUS.

Once rcu_init_geometry() is called, the nodes distribution is shrinked
and optimized toward meeting the actual possible number of CPUs detected
on boot.

Later on, the ssp that was initialized before rcu_init_geometry() is
confused and sometimes refers to its initial CONFIG_NR_CPUS based node
hierarchy, sometimes to the new num_possible_cpus() based one instead.
For example each of its sdp->mynode remain backward and refer to the
early node leaves that may not exist anymore. On the other hand the
srcu_for_each_node_breadth_first() refers to the new node hierarchy.

There are at least two bad possible outcomes to this:

1) a) A callback enqueued early on an sdp is recorded pending on
  sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with
  sdp->mynode pointing to a deep leaf (say 3 levels).

   b) The grace period ends after rcu_init_geometry() which shrinks the
  nodes level to a single one. srcu_gp_end() walks through the new
  snp hierarchy without ever reaching the old leaves so the callback
  is never executed.

   This is easily reproduced on an 8 CPUs machine with
   CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The
   srcu_barrier() after early tests verification never completes and
   the boot hangs:

[ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 
seconds.
[ 5413.147564]   Not tainted 5.12.0-rc4+ #28
[ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 5413.159753] task:swapper/0   state:D stack:0 pid:1 ppid: 
0 flags:0x4000
[ 5413.168099] Call Trace:
[ 5413.170555]  __schedule+0x36c/0x930
[ 5413.174057]  ? wait_for_completion+0x88/0x110
[ 5413.178423]  schedule+0x46/0xf0
[ 5413.181575]  schedule_timeout+0x284/0x380
[ 5413.185591]  ? wait_for_completion+0x88/0x110
[ 5413.189957]  ? mark_held_locks+0x61/0x80
[ 5413.193882]  ? mark_held_locks+0x61/0x80
[ 5413.197809]  ? _raw_spin_unlock_irq+0x24/0x50
[ 5413.202173]  ? wait_for_completion+0x88/0x110
[ 5413.206535]  wait_for_completion+0xb4/0x110
[ 5413.210724]  ? srcu_torture_stats_print+0x110/0x110
[ 5413.215610]  srcu_barrier+0x187/0x200
[ 5413.219277]  ? rcu_tasks_verify_self_tests+0x50/0x50
[ 5413.224244]  ? rdinit_setup+0x2b/0x2b
[ 5413.227907]  rcu_verify_early_boot_tests+0x2d/0x40
[ 5413.232700]  do_one_initcall+0x63/0x310
[ 5413.236541]  ? rdinit_setup+0x2b/0x2b
[ 5413.240207]  ? rcu_read_lock_sched_held+0x52/0x80
[ 5413.244912]  kernel_init_freeable+0x253/0x28f
[ 5413.249273]  ? rest_init+0x250/0x250
[ 5413.252846]  kernel_init+0xa/0x110
[ 5413.256257]  ret_from_fork+0x22/0x30

2) An ssp that gets initialized before rcu_init_geometry() and used
   afterward will always have stale rdp->mynode references, resulting in
   callbacks to be missed in srcu_gp_end(), just like in the previous
   scenario.

Solve this with initializing nodes geometry whenever an struct srcu_state
happens to be initialized before rcu_init(). This way we make sure the
RCU nodes hierarchy is properly built and distributed before the nodes
of an struct srcu_state are allocated.

Suggested-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Boqun Feng 
Cc: Lai Jiangshan 
Cc: Neeraj Upadhyay 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Uladzislau Rezki 
---
 kernel/rcu/rcu.h  |  2 ++
 kernel/rcu/srcutree.c |  3 +++
 kernel/rcu/tree.c | 17 -
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 

Re: [PATCH 1/2] srcu: Fix broken node geometry after early ssp init

2021-04-16 Thread Frederic Weisbecker
On Wed, Apr 14, 2021 at 08:55:38AM -0700, Paul E. McKenney wrote:
> On Wed, Apr 14, 2021 at 03:24:12PM +0200, Frederic Weisbecker wrote:
> > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > index 75ed367d5b60..24db97cbf76b 100644
> > --- a/kernel/rcu/rcu.h
> > +++ b/kernel/rcu/rcu.h
> > @@ -278,6 +278,7 @@ extern void resched_cpu(int cpu);
> >  extern int rcu_num_lvls;
> >  extern int num_rcu_lvl[];
> >  extern int rcu_num_nodes;
> > +extern bool rcu_geometry_initialized;
> 
> Can this be a static local variable inside rcu_init_geometry()?
> 
> After all, init_srcu_struct() isn't called all that often, and its overhead
> is such that an extra function call and check is going to hurt it.  This
> of course requires removing __init from rcu_init_geometry(), but it is not
> all that large, so why not just remove the __init?
> 
> But if we really are worried about reclaiming rcu_init_geometry()'s
> instructions (maybe we are?), then rcu_init_geometry() can be split
> into a function that just does the check (which is not __init) and the
> remainder of the function, which could remain __init.

Indeed that makes sense, I'll move the variable inside rcu_init_geometry().
Also since rcu_init_geometry() can now be called anytime after the boot, I
already removed the __init. I don't think we can do the split trick because a
non-init function can't call an __init function. That would trigger a section
mismatch.


> > @@ -171,6 +171,8 @@ static int init_srcu_struct_fields(struct srcu_struct 
> > *ssp, bool is_static)
> > ssp->sda = alloc_percpu(struct srcu_data);
> > if (!ssp->sda)
> > return -ENOMEM;
> > +   if (!rcu_geometry_initialized)
> > +   rcu_init_geometry();
> 
> With the suggested change above, this just becomes an unconditional call
> to rcu_init_geometry().

Right.

> > -static void __init rcu_init_geometry(void)
> > +void rcu_init_geometry(void)
> >  {
> > ulong d;
> > int i;
> > +   static unsigned long old_nr_cpu_ids;
> > int rcu_capacity[RCU_NUM_LVLS];
> 
> And then rcu_geometry_initialized is declared static here.
> 
> Or am I missing something?

Looks good, I'll resend with that.

Thanks!

> 
> > +   if (rcu_geometry_initialized) {
> > +   /*
> > +* Arrange for warning if rcu_init_geometry() was called before
> > +* setup_nr_cpu_ids(). We may miss cases when
> > +* nr_cpus_ids == NR_CPUS but that shouldn't matter too much.
> > +*/
> > +   WARN_ON_ONCE(old_nr_cpu_ids != nr_cpu_ids);
> > +   return;
> > +   }
> > +
> > +   old_nr_cpu_ids = nr_cpu_ids;
> > +   rcu_geometry_initialized = true;
> > +
> > /*
> >  * Initialize any unspecified boot parameters.
> >  * The default values of jiffies_till_first_fqs and
> > -- 
> > 2.25.1
> > 


Re: [PATCH 1/2] srcu: Fix broken node geometry after early ssp init

2021-04-14 Thread Paul E. McKenney
On Wed, Apr 14, 2021 at 03:24:12PM +0200, Frederic Weisbecker wrote:
> An ssp initialized before rcu_init_geometry() will have its snp hierarchy
> based on CONFIG_NR_CPUS.
> 
> Once rcu_init_geometry() is called, the nodes distribution is shrinked
> and optimized toward meeting the actual possible number of CPUs detected
> on boot.
> 
> Later on, the ssp that was initialized before rcu_init_geometry() is
> confused and sometimes refers to its initial CONFIG_NR_CPUS based node
> hierarchy, sometimes to the new num_possible_cpus() based one instead.
> For example each of its sdp->mynode remain backward and refer to the
> early node leaves that may not exist anymore. On the other hand the
> srcu_for_each_node_breadth_first() refers to the new node hierarchy.
> 
> There are at least two bad possible outcomes to this:
> 
> 1) a) A callback enqueued early on an sdp is recorded pending on
>   sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with
>   sdp->mynode pointing to a deep leaf (say 3 levels).
> 
>b) The grace period ends after rcu_init_geometry() which shrinks the
>   nodes level to a single one. srcu_gp_end() walks through the new
>   snp hierarchy without ever reaching the old leaves so the callback
>   is never executed.
> 
>This is easily reproduced on an 8 CPUs machine with
>CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The
>srcu_barrier() after early tests verification never completes and
>the boot hangs:
> 
>   [ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 
> seconds.
>   [ 5413.147564]   Not tainted 5.12.0-rc4+ #28
>   [ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
>   [ 5413.159753] task:swapper/0   state:D stack:0 pid:1 ppid: 
> 0 flags:0x4000
>   [ 5413.168099] Call Trace:
>   [ 5413.170555]  __schedule+0x36c/0x930
>   [ 5413.174057]  ? wait_for_completion+0x88/0x110
>   [ 5413.178423]  schedule+0x46/0xf0
>   [ 5413.181575]  schedule_timeout+0x284/0x380
>   [ 5413.185591]  ? wait_for_completion+0x88/0x110
>   [ 5413.189957]  ? mark_held_locks+0x61/0x80
>   [ 5413.193882]  ? mark_held_locks+0x61/0x80
>   [ 5413.197809]  ? _raw_spin_unlock_irq+0x24/0x50
>   [ 5413.202173]  ? wait_for_completion+0x88/0x110
>   [ 5413.206535]  wait_for_completion+0xb4/0x110
>   [ 5413.210724]  ? srcu_torture_stats_print+0x110/0x110
>   [ 5413.215610]  srcu_barrier+0x187/0x200
>   [ 5413.219277]  ? rcu_tasks_verify_self_tests+0x50/0x50
>   [ 5413.224244]  ? rdinit_setup+0x2b/0x2b
>   [ 5413.227907]  rcu_verify_early_boot_tests+0x2d/0x40
>   [ 5413.232700]  do_one_initcall+0x63/0x310
>   [ 5413.236541]  ? rdinit_setup+0x2b/0x2b
>   [ 5413.240207]  ? rcu_read_lock_sched_held+0x52/0x80
>   [ 5413.244912]  kernel_init_freeable+0x253/0x28f
>   [ 5413.249273]  ? rest_init+0x250/0x250
>   [ 5413.252846]  kernel_init+0xa/0x110
>   [ 5413.256257]  ret_from_fork+0x22/0x30
> 
> 2) An ssp that gets initialized before rcu_init_geometry() and used
>afterward will always have stale rdp->mynode references, resulting in
>callbacks to be missed in srcu_gp_end(), just like in the previous
>scenario.
> 
> Solve this with calling rcu_init_geometry() whenever an struct srcu_state
> happens to be initialized before rcu_init(). This way we make sure the
> RCU nodes hierarchy is properly built and distributed before the nodes
> of a struct srcu_state are allocated.
> 
> Suggested-by: Paul E. McKenney 
> Signed-off-by: Frederic Weisbecker 

Much much nicer, thank you!  Comments and questions interspersed.

Thanx, Paul

> Cc: Boqun Feng 
> Cc: Lai Jiangshan 
> Cc: Neeraj Upadhyay 
> Cc: Josh Triplett 
> Cc: Joel Fernandes 
> Cc: Uladzislau Rezki 
> ---
>  kernel/rcu/rcu.h  |  3 +++
>  kernel/rcu/srcutree.c |  2 ++
>  kernel/rcu/tree.c | 18 +-
>  3 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 75ed367d5b60..24db97cbf76b 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -278,6 +278,7 @@ extern void resched_cpu(int cpu);
>  extern int rcu_num_lvls;
>  extern int num_rcu_lvl[];
>  extern int rcu_num_nodes;
> +extern bool rcu_geometry_initialized;

Can this be a static local variable inside rcu_init_geometry()?

After all, init_srcu_struct() isn't called all that often, and its overhead
is such that an extra function call and check is going to hurt it.  This
of course requires removing __init from rcu_init_geometry(), but it is not
all that large, so why not just remove the __init?

But if we really are worried about reclaiming rcu_init_geometry()'s
instructions (maybe we are?), then rcu_init_geometry() can be split
into a function that just does the check (which is not __init) and the
remainder of the function, which could remain 

[PATCH 1/2] srcu: Fix broken node geometry after early ssp init

2021-04-14 Thread Frederic Weisbecker
An ssp initialized before rcu_init_geometry() will have its snp hierarchy
based on CONFIG_NR_CPUS.

Once rcu_init_geometry() is called, the nodes distribution is shrinked
and optimized toward meeting the actual possible number of CPUs detected
on boot.

Later on, the ssp that was initialized before rcu_init_geometry() is
confused and sometimes refers to its initial CONFIG_NR_CPUS based node
hierarchy, sometimes to the new num_possible_cpus() based one instead.
For example each of its sdp->mynode remain backward and refer to the
early node leaves that may not exist anymore. On the other hand the
srcu_for_each_node_breadth_first() refers to the new node hierarchy.

There are at least two bad possible outcomes to this:

1) a) A callback enqueued early on an sdp is recorded pending on
  sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with
  sdp->mynode pointing to a deep leaf (say 3 levels).

   b) The grace period ends after rcu_init_geometry() which shrinks the
  nodes level to a single one. srcu_gp_end() walks through the new
  snp hierarchy without ever reaching the old leaves so the callback
  is never executed.

   This is easily reproduced on an 8 CPUs machine with
   CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The
   srcu_barrier() after early tests verification never completes and
   the boot hangs:

[ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 
seconds.
[ 5413.147564]   Not tainted 5.12.0-rc4+ #28
[ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 5413.159753] task:swapper/0   state:D stack:0 pid:1 ppid: 
0 flags:0x4000
[ 5413.168099] Call Trace:
[ 5413.170555]  __schedule+0x36c/0x930
[ 5413.174057]  ? wait_for_completion+0x88/0x110
[ 5413.178423]  schedule+0x46/0xf0
[ 5413.181575]  schedule_timeout+0x284/0x380
[ 5413.185591]  ? wait_for_completion+0x88/0x110
[ 5413.189957]  ? mark_held_locks+0x61/0x80
[ 5413.193882]  ? mark_held_locks+0x61/0x80
[ 5413.197809]  ? _raw_spin_unlock_irq+0x24/0x50
[ 5413.202173]  ? wait_for_completion+0x88/0x110
[ 5413.206535]  wait_for_completion+0xb4/0x110
[ 5413.210724]  ? srcu_torture_stats_print+0x110/0x110
[ 5413.215610]  srcu_barrier+0x187/0x200
[ 5413.219277]  ? rcu_tasks_verify_self_tests+0x50/0x50
[ 5413.224244]  ? rdinit_setup+0x2b/0x2b
[ 5413.227907]  rcu_verify_early_boot_tests+0x2d/0x40
[ 5413.232700]  do_one_initcall+0x63/0x310
[ 5413.236541]  ? rdinit_setup+0x2b/0x2b
[ 5413.240207]  ? rcu_read_lock_sched_held+0x52/0x80
[ 5413.244912]  kernel_init_freeable+0x253/0x28f
[ 5413.249273]  ? rest_init+0x250/0x250
[ 5413.252846]  kernel_init+0xa/0x110
[ 5413.256257]  ret_from_fork+0x22/0x30

2) An ssp that gets initialized before rcu_init_geometry() and used
   afterward will always have stale rdp->mynode references, resulting in
   callbacks to be missed in srcu_gp_end(), just like in the previous
   scenario.

Solve this with calling rcu_init_geometry() whenever an struct srcu_state
happens to be initialized before rcu_init(). This way we make sure the
RCU nodes hierarchy is properly built and distributed before the nodes
of a struct srcu_state are allocated.

Suggested-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Boqun Feng 
Cc: Lai Jiangshan 
Cc: Neeraj Upadhyay 
Cc: Josh Triplett 
Cc: Joel Fernandes 
Cc: Uladzislau Rezki 
---
 kernel/rcu/rcu.h  |  3 +++
 kernel/rcu/srcutree.c |  2 ++
 kernel/rcu/tree.c | 18 +-
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 75ed367d5b60..24db97cbf76b 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -278,6 +278,7 @@ extern void resched_cpu(int cpu);
 extern int rcu_num_lvls;
 extern int num_rcu_lvl[];
 extern int rcu_num_nodes;
+extern bool rcu_geometry_initialized;
 static bool rcu_fanout_exact;
 static int rcu_fanout_leaf;
 
@@ -308,6 +309,8 @@ static inline void rcu_init_levelspread(int *levelspread, 
const int *levelcnt)
}
 }
 
+extern void rcu_init_geometry(void);
+
 /* Returns a pointer to the first leaf rcu_node structure. */
 #define rcu_first_leaf_node() (rcu_state.level[rcu_num_lvls - 1])
 
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 108f9ca06047..05ca3c275af1 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -171,6 +171,8 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp, 
bool is_static)
ssp->sda = alloc_percpu(struct srcu_data);
if (!ssp->sda)
return -ENOMEM;
+   if (!rcu_geometry_initialized)
+   rcu_init_geometry();
init_srcu_struct_nodes(ssp);
ssp->srcu_gp_seq_needed_exp = 0;
ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
diff --git