Hi Peter, On Sat, Aug 29, 2020 at 09:47:19AM +0200, pet...@infradead.org wrote: > On Fri, Aug 28, 2020 at 06:02:25PM -0400, Vineeth Pillai wrote: > > On 8/28/20 4:51 PM, Peter Zijlstra wrote: > > > > So where do things go side-ways? > > > During hotplug stress test, we have noticed that while a sibling is in > > pick_next_task, another sibling can go offline or come online. What > > we have observed is smt_mask get updated underneath us even if > > we hold the lock. From reading the code, looks like we don't hold the > > rq lock when the mask is updated. This extra logic was to take care of that. > > Sure, the mask is updated async, but _where_ is the actual problem with > that? > > On Fri, Aug 28, 2020 at 06:23:55PM -0400, Joel Fernandes wrote: > > Thanks Vineeth. Peter, also the "v6+" series (which were some addons on v6) > > detail the individual hotplug changes squashed into this patch: > > https://lore.kernel.org/lkml/20200815031908.1015049-9-j...@joelfernandes.org/ > > https://lore.kernel.org/lkml/20200815031908.1015049-11-j...@joelfernandes.org/ > > That one looks fishy, the pick is core wide, making that pick_seq per rq > just doesn't make sense.
I think Vineeth was trying to handle the case where rq->core_pick happened to be NULL for an offline CPU, and then schedule() is called when it came online but its sched_seq != core-wide pick_seq. The reason for this situation is because a sibling did selection for the offline CPU and ended up leaving its rq->core_pick as NULL as the then-offline CPU was missing from the cpu_smt_mask, but it incremented the core-wide pick_seq anyway. Due to this, the pick_next_task() can crash after entering this if() block: + if (rq->core_pick_seq == rq->core->core_task_seq && + rq->core_pick_seq != rq->core_sched_seq) { How would you suggest to fix it? Maybe we can just assign rq->core_sched_seq == rq->core_pick_seq for an offline CPU (or any CPU where rq->core_pick == NULL), so it does not end up using rq->core_pick and does a full core-wide selcetion again when it comes online? Or easier, check for rq->core_pick == NULL and skip this fast-path if() block completely. > > https://lore.kernel.org/lkml/20200815031908.1015049-12-j...@joelfernandes.org/ > > This one reads like tinkering, there is no description of the actual > problem just some code that makes a symptom go away. > > Sure, on hotplug the smt mask can change, but only for a CPU that isn't > actually scheduling, so who cares. > > /me re-reads the hotplug code... > > ..ooOO is the problem that we clear the cpumasks on take_cpu_down() > instead of play_dead() ?! That should be fixable. I think Vineeth explained this in his email, there is logic across the loops in the pick_next_task() that depend on the cpu_smt_mask not change. I am not sure if play_dead() will fix it, the issue is seen in the code doing the selection and the cpu_smt_mask changing under it due to possibly other CPUs going offline. Example, you have a splat and null pointer dereference possibilities in the below loop if rq_i ->core_pick == NULL, because a sibling CPU came online but a task was not selected for it in the for loops prior to this for loop: /* * Reschedule siblings * * NOTE: L1TF -- at this point we're no longer running the old task and * sending an IPI (below) ensures the sibling will no longer be running * their task. This ensures there is no inter-sibling overlap between * non-matching user state. */ for_each_cpu(i, smt_mask) { struct rq *rq_i = cpu_rq(i); WARN_ON_ONCE(!rq_i->core_pick); if (is_idle_task(rq_i->core_pick) && rq_i->nr_running) rq_i->core_forceidle = true; rq_i->core_pick->core_occupation = occ; Probably the code can be rearchitected to not depend on cpu_smt_mask changing. What I did in my old tree is I made a copy of the cpu_smt_mask in the beginning of this function, and that makes all the problems go away. But I was afraid of overhead of that copying. (btw, I would not complain one bit if this function was nuked and rewritten to be simpler). > > https://lore.kernel.org/lkml/20200815031908.1015049-13-j...@joelfernandes.org/ > > This is the only one that makes some sense, it makes rq->core consistent > over hotplug. Cool at least we got one thing right ;) > > Agreed we can split the patches for the next series, however for final > > upstream merge, I suggest we fix hotplug issues in this patch itself so that > > we don't break bisectability. > > Meh, who sodding cares about hotplug :-). Also you can 'fix' such things > by making sure you can't actually enable core-sched until after > everything is in place. Fair enough :) thanks, - Joel