On Wed, Feb 17, 2021 at 08:43:59PM +0100, Sebastian Andrzej Siewior wrote: > On 2021-02-17 11:19:07 [-0800], Paul E. McKenney wrote: > > > Ah. One nice thing is that you can move the RCU threads to a house > > > keeping CPU - away from the CPU(s) running the RT tasks. Would this > > > scenario be still affected (if ksoftirqd would be blocked)? > > > > At this point, I am going to say that it is the sysadm's job to place > > the rcuo kthreads, and if they are placed poorly, life is hard. > > Good. Because that is what I suggest :) > > > > Oh. One thing I forgot to mention: the timer_list timer is nice in terms > > > of moving forward (the timer did not fire, the condition is true and you > > > move the timeout forward). > > > A hrtimer timer on the other hand needs to be removed, forwarded and > > > added back to the "timer tree". This is considered more expensive > > > especially if the timer does not fire. > > > > There are some timers that are used to cause a wakeup to happen from > > a clean environment, but maybe these can instead use irq-work. > > irq-work has also a "hard" mode because people ended up to throwing > everything in there.
So after beating on this a bit, I decided that the timer problems are specific to rcutorture. The problem is making a test that reliably and quickly finds RCU priority boosting failures, while not destroying the rest of the kernel's ability to function. The current approach is CPU-bound FIFO priority-1 real-time processes, and with the RT throttling disabled. And one such process on each online CPU, all running concurrently for a four-second time period. This is of course a great way to destroy the rest of the kernel. And in any other context, this would be abuse that was expected to destroy the kernel. So rather than make large changes to RCU, I instead check to see whether rcutorture is built into the kernel, and in that case crank all of the ksoftirqd kthreads up to FIFO priority 2, the same as the various RCU kthreads. This is a simple change, and if you are running rcutorture built-in in production you are in a world of hurt anyway, so realtime ksoftirqd kthreads are the least of your problems. > > That it can! Aravinda Prasad prototyped a mechanism hinting to the > > hypervisor in such cases, but I don't know that this ever saw the light > > of day. > > Ah, good to know. > > > > My understanding of the need for RCU boosting is to get a task, > > > preempted (by a RT task) within a RCU section, back on the CPU to > > > at least close the RCU section. So it is possible to run RCU callbacks > > > and free memory. > > > The 10 seconds without RCU callbacks shouldn't be bad unless the OOM > > > killer got nervous (and if we had memory allocation failures). > > > Also, running thousands of accumulated callbacks isn't good either. > > > > Sounds good, thank you! > > I hope my understanding was correct. Glad to be if service :) So far, so good! ;-) After reworking rcu_torture_boost() to use the polling interfaces, thus getting callback invocation out of the way, there did turn out to be a bug in RCU priority boosting. If you had a system with more than one CPU, but also having an rcu_node structure with only one CPU, and task that blocked while running on that one CPU would never be RCU priority boosted. How did this happen? One step at a time... o A bug I was chasing a few years back reproduced much more quickly if I enabled CPU hotplug on TREE03. o And in addition, x86 no longer supports configurations where CPUs cannot be hotplugged (mumble mumble security mumble mumble), which means that the rcutorture scripting is always going to test CPU hotplug. o TREE03 was the one scenario that tested RCU priority boosting. o But RCU priority-boost testing assumes that CPU hotplug was disabled. So much so that if it saw that there was any CPU-hotplug work going on, it would disable itself. o So RCU priority boosting has gone completely untested for quite a few years. o Quite a few more years back, I learned that firmware sometimes lies about the number of CPUs. I learned this from bug reports noting that RCU was creating way more kthreads than made any sense on small CPUs. o So the spawning of kthreads that are per-CPU or per-group-of-CPUs is done at CPU-online time. Which ensures that systems get the right number of RCU kthreads even in the presence of lying firmware. In the case of the RCU boost kthreads, the code verifies that the rcu_node structure in question has at least one online CPU before spawning the corresponding kthread. o Except that changes in CPU hotplug mean that it is quite possible for the incoming CPU to not be fully online at the time that rcutree_online_cpu() executes. This means that the RCU boost kthread will be spawned when the second for a given rcu_node structure comes online. o Which means that rcu_node structures that have only one CPU never get an RCU boost kthread. This is unusual, requiring 17, 33, 49, 65, ... CPUs on the system given default values for RCU kconfig options. But it can be made to happen, especially when using the rcutorture scripting. ('--kconfig "CONFIG_NR_CPUS=17" ...') The fix is to push the check for at least one online CPU up into the loop that is executed during early boot, which means that the calls from rcutree_online_cpu() can be moved to rcutree_prepare_cpu(), and can simply assume that the CPU will eventually come online. The patch is shown below. The rcu_torture_boost() function required additional rework because CPUs can fail to pass through a quiescent state for some seconds from time to time, and there is nothing that RCU priority boosting can do about this. I therefore added checks for this condition, and refrain from reporting an error in such cases. I also created a new BUSTED-BOOST scenario that tests RCU priority boosting on a kernel built with CONFIG_RCU_BOOST=y. This fails within a few tens of seconds, so the test might actually be working. If testing continues to go well, I expect to submit these into the v5.14 merge window (not the upcoming one, but the one after that). More than you probably wanted to know, but at least both the boosting and its tests now seem to be doing something useful. Thanx, Paul ------------------------------------------------------------------------ commit 8193d3465bc8fb8bd3ee765c42df2f551c01329a Author: Paul E. McKenney <paul...@kernel.org> Date: Mon Apr 5 20:42:09 2021 -0700 rcu: Make RCU priority boosting work on single-CPU rcu_node structures When any CPU comes online, it checks to see if an RCU-boost kthread has already been created for that CPU's leaf rcu_node structure, and if not, it creates one. Unfortunately, it also verifies that this leaf rcu_node structure actually has at least one online CPU, and if not, it declines to create the kthread. Although this behavior makes sense during early boot, especially on systems that claim far more CPUs than they actually have, it makes no sense for the first CPU to come online for a given rcu_node structure. There is no point in checking because we know there is a CPU on its way in. The problem is that timing differences can cause this incoming CPU to not yet be reflected in the various bit masks even at rcutree_online_cpu() time, and there is no chance at rcutree_prepare_cpu() time. Plus it would be better to create the RCU-boost kthread at rcutree_prepare_cpu() to handle the case where the CPU is involved in an RCU priority inversion very shortly after it comes online. This commit therefore moves the checking to rcu_prepare_kthreads(), which is called only at early boot, when the check is appropriate. In addition, it makes rcutree_prepare_cpu() invoke rcu_spawn_one_boost_kthread(), which no longer does any checking for online CPUs. With this change, RCU priority boosting tests now pass for short rcutorture runs, even with single-CPU leaf rcu_node structures. Cc: Sebastian Andrzej Siewior <bige...@linutronix.de> Cc: Scott Wood <sw...@redhat.com> Cc: Thomas Gleixner <t...@linutronix.de> Signed-off-by: Paul E. McKenney <paul...@kernel.org> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 03282196953a..5c214705c33f 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4167,7 +4167,7 @@ int rcutree_prepare_cpu(unsigned int cpu) rdp->rcu_iw_gp_seq = rdp->gp_seq - 1; trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl")); raw_spin_unlock_irqrestore_rcu_node(rnp, flags); - rcu_prepare_kthreads(cpu); + rcu_spawn_one_boost_kthread(rnp); rcu_spawn_cpu_nocb_kthread(cpu); WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus + 1); diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 007be551803e..305cf6aeb408 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -417,8 +417,8 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags); static void rcu_preempt_boost_start_gp(struct rcu_node *rnp); static bool rcu_is_callbacks_kthread(void); static void rcu_cpu_kthread_setup(unsigned int cpu); +static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp); static void __init rcu_spawn_boost_kthreads(void); -static void rcu_prepare_kthreads(int cpu); static void rcu_cleanup_after_idle(void); static void rcu_prepare_for_idle(void); static bool rcu_preempt_has_tasks(struct rcu_node *rnp); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 6bf72c4132a6..464b16132866 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1188,22 +1188,16 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp) */ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) { - int rnp_index = rnp - rcu_get_root(); unsigned long flags; + int rnp_index = rnp - rcu_get_root(); struct sched_param sp; struct task_struct *t; - if (!IS_ENABLED(CONFIG_PREEMPT_RCU)) - return; - - if (!rcu_scheduler_fully_active || rcu_rnp_online_cpus(rnp) == 0) + if (rnp->boost_kthread_task || !rcu_scheduler_fully_active) return; rcu_state.boost = 1; - if (rnp->boost_kthread_task != NULL) - return; - t = kthread_create(rcu_boost_kthread, (void *)rnp, "rcub/%d", rnp_index); if (WARN_ON_ONCE(IS_ERR(t))) @@ -1255,17 +1249,8 @@ static void __init rcu_spawn_boost_kthreads(void) struct rcu_node *rnp; rcu_for_each_leaf_node(rnp) - rcu_spawn_one_boost_kthread(rnp); -} - -static void rcu_prepare_kthreads(int cpu) -{ - struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); - struct rcu_node *rnp = rdp->mynode; - - /* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */ - if (rcu_scheduler_fully_active) - rcu_spawn_one_boost_kthread(rnp); + if (rcu_rnp_online_cpus(rnp)) + rcu_spawn_one_boost_kthread(rnp); } #else /* #ifdef CONFIG_RCU_BOOST */ @@ -1285,15 +1270,15 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp) { } -static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) +static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) { } -static void __init rcu_spawn_boost_kthreads(void) +static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) { } -static void rcu_prepare_kthreads(int cpu) +static void __init rcu_spawn_boost_kthreads(void) { }