On large systems with high core counts, toggling SMT modes via sysfs (/sys/devices/system/cpu/smt/control) incurs significant latency. For instance, on ~2000 CPUs, switching SMT modes can take close to an hour as the system hotplugs each hardware thread individually. This series reduces this time to minutes.
Analysis of the hotplug path [1] identifies synchronize_rcu() as the primary bottleneck. During a bulk SMT switch, the kernel repeatedly enters RCU grace periods for each CPU being brought online or offline. This series optimizes the SMT transition in two ways: 1. Lock Batching [1]: Instead of repeatedly acquiring and releasing the CPU hotplug write lock for every individual CPU, we now hold cpus_write_lock across the entire SMT toggle operation. 2. Expedite RCU grace period [2]: It utilizes rcu_expedite_gp() to force expedite grace periods specifically for the duration of the SMT switch. The trade-off is justified here to prevent the administrative task of SMT switching from stalling for an unacceptable duration on large systems. Changes since v1 Link: https://lore.kernel.org/all/[email protected]/ Expedite system-wide synchronize_rcu only when SMT switch operation are triggered via /sys/devices/system/cpu/smt/control interface. Changes since v2 Link: https://lore.kernel.org/all/[email protected]/ Move the declaration of rcu_[un]expedite_gp() to include/linux/rcupdate.h. Thanks Shrikanth for sharing the fix and kernel test robot for finding the issue. [3] [1] https://lore.kernel.org/all/5f2ab8a44d685701fe36cdaa8042a1aef215d10d.ca...@linux.vnet.ibm.com [2] https://lore.kernel.org/all/[email protected]/ [3] https://lore.kernel.org/all/[email protected]/ Vishal Chourasia (2): cpuhp: Optimize SMT switch operation by batching lock acquisition cpuhp: Expedite RCU grace periods during SMT operations include/linux/rcupdate.h | 8 +++++ kernel/cpu.c | 76 +++++++++++++++++++++++++++++----------- kernel/rcu/rcu.h | 4 --- 3 files changed, 64 insertions(+), 24 deletions(-) -- 2.53.0

