On Mon, Jan 12, 2026 at 11:57:41PM +0530, Vishal Chourasia wrote:
> Hello Joel, Paul, Uladzislau,
>
> On Mon, Jan 12, 2026 at 06:05:30PM +0100, Uladzislau Rezki wrote:
> > On Mon, Jan 12, 2026 at 08:48:42AM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 12, 2026 at 04:09:49PM +0000, Joel Fernandes wrote:
> > > >
> > > >
> > > > > On Jan 12, 2026, at 7:57 AM, Uladzislau Rezki <[email protected]>
> > > > > wrote:
> > > > >
> > > > >>
> > > > > Sounds good to me. I agree it is better to bypass parameters.
> > > >
> > > > Another way to make it in-kernel would be to make the RCU normal wake
> > > > from GP optimization enabled for > 16 CPUs by default.
> > > >
> > > > I was considering this, but I did not bring it up because I did not
> > > > know that there are large systems that might benefit from it until now.
> > >
> > > This would require increasing the scalability of this optimization,
> > > right? Or am I thinking of the wrong optimization? ;-)
> > >
> > I tested this before. I noticed that after 64K of simultaneous
> > synchronize_rcu() calls the scalability is required. Everything
> > less was faster with a new approach.
>
> It is worth noting that bulk CPU hotplug represents a different stress
> pattern than the "simultaneous call" scenario mentioned above.
>
> In a large-scale hotplug event (like a SMT mode switch), we aren't
> necessarily seeing thousands of simultaneous synchronize_rcu() calls.
> Instead, because CPU hotplug operations are serialized, we see a
> "conveyor belt" of sequential calls. One synchronize_rcu() blocks, the
> hotplug state machine waits, it unblocks, and then the next call is
> triggered shortly after.
>
> The bottleneck here isn't RCU scalability under concurrent load, but
> rather the accumulated latency of hundreds of sequential Grace Periods.
>
> For example, on pSeries, onlining 350 out of 400 CPUs triggers exactly
> 350 calls at three different points in the hotplug state machine. Even
> though they happen one at a time, the sheer volume makes the total
> operation time prohibitive.
>
> Following callstack was collected during SMT mode switch where 350 out
> of 400 CPUs were onlined,
>
> @[
> synchronize_rcu+12
> cpuidle_pause_and_lock+120
> pseries_cpuidle_cpu_online+88
> cpuhp_invoke_callback+500
> cpuhp_thread_fun+316
> smpboot_thread_fn+512
> kthread+308
> start_kernel_thread+20
> ]: 350
> @[
> synchronize_rcu+12
> rcu_sync_enter+260
> percpu_down_write+76
> _cpu_up+140
> cpu_up+440
> cpu_subsys_online+128
> device_online+176
> online_store+220
> dev_attr_store+52
> sysfs_kf_write+120
> kernfs_fop_write_iter+456
> vfs_write+952
> ksys_write+132
> system_call_exception+292
> system_call_vectored_common+348
> ]: 350
> @[
> synchronize_rcu+12
> rcu_sync_enter+260
> percpu_down_write+76
> try_online_node+64
> cpu_up+120
> cpu_subsys_online+128
> device_online+176
> online_store+220
> dev_attr_store+52
> sysfs_kf_write+120
> kernfs_fop_write_iter+456
> vfs_write+952
> ksys_write+132
> system_call_exception+292
> system_call_vectored_common+348
> ]: 350
>
> Following callstack was collected during SMT mode switch where 350 out
> of 400 CPUs where offlined,
>
> @[
> synchronize_rcu+12
> rcu_sync_enter+260
> percpu_down_write+76
> _cpu_down+188
> __cpu_down_maps_locked+44
> work_for_cpu_fn+56
> process_one_work+508
> worker_thread+840
> kthread+308
> start_kernel_thread+20
> ]: 1
> @[
> synchronize_rcu+12
> sched_cpu_deactivate+244
> cpuhp_invoke_callback+500
> cpuhp_thread_fun+316
> smpboot_thread_fn+512
> kthread+308
> start_kernel_thread+20
> ]: 350
> @[
> synchronize_rcu+12
> cpuidle_pause_and_lock+120
> pseries_cpuidle_cpu_dead+88
> cpuhp_invoke_callback+500
> __cpuhp_invoke_callback_range+200
> _cpu_down+412
> __cpu_down_maps_locked+44
> work_for_cpu_fn+56
> process_one_work+508
> worker_thread+840
> kthread+308
> start_kernel_thread+20
> ]: 350
I still suggest that you test on a big system. There are other sources
of synchronize_rcu() calls than just CPU hotplug. ;-)
Thanx, Paul