On Mon, Jul 20, 2020 at 01:26:23PM +0200, pet...@infradead.org wrote: > On Mon, Jul 20, 2020 at 12:59:24PM +0200, pet...@infradead.org wrote: > > On Mon, Jul 20, 2020 at 10:41:06AM +0200, Peter Zijlstra wrote: > > > On Mon, Jul 20, 2020 at 10:26:58AM +0200, Oleg Nesterov wrote: > > > > Peter, > > > > > > > > Let me add another note. TASK_TRACED/TASK_STOPPED was always protected > > > > by > > > > ->siglock. In particular, ttwu(__TASK_TRACED) must be always called with > > > > ->siglock held. That is why ptrace_freeze_traced() assumes it can safely > > > > do s/TASK_TRACED/__TASK_TRACED/ under spin_lock(siglock). > > > > > > > > Can this change race with > > > > > > > > if (signal_pending_state(prev->state, prev)) { > > > > prev->state = TASK_RUNNING; > > > > } > > > > > > > > in __schedule() ? Hopefully not, signal-state is protected by siglock > > > > too. > > > > > > > > So I think this logic was correct even if it doesn't look nice. But > > > > "doesn't > > > > look nice" is true for the whole ptrace code ;) > > > > > > *groan*... another bit of obscure magic :-( > > > > > > let me go try and wake up and figure out how best to deal with this. > > This then? That seems to survive the strace thing.
Ran a mainline v5.8-rc6 -> reproduced crash Ran a mainling v5.8-rc6 + patch below -> testsuite runs fine/not able to reproduce crash So fwiw Tested-by: Christian Brauner <christian.brau...@ubuntu.com> Acked-by: Christian Brauner <christian.brau...@ubuntu.com> > > --- > kernel/sched/core.c | 34 ++++++++++++++++++++++++++++------ > 1 file changed, 28 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index e15543cb84812..b5973d7fa521c 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -4100,9 +4100,9 @@ pick_next_task(struct rq *rq, struct task_struct *prev, > struct rq_flags *rf) > */ > static void __sched notrace __schedule(bool preempt) > { > + unsigned long prev_state, tmp_state; > struct task_struct *prev, *next; > unsigned long *switch_count; > - unsigned long prev_state; > struct rq_flags rf; > struct rq *rq; > int cpu; > @@ -4140,16 +4140,38 @@ static void __sched notrace __schedule(bool preempt) > rq_lock(rq, &rf); > smp_mb__after_spinlock(); > > + /* > + * We must re-load prev->state in case ttwu_remote() changed it > + * before we acquired rq->lock. > + */ > + tmp_state = prev->state; > + if (unlikely(prev_state != tmp_state)) { > + /* > + * ptrace_{,un}freeze_traced() think it is cool to change > + * ->state around behind our backs between TASK_TRACED and > + * __TASK_TRACED. > + * > + * This is safe because this, as well as any __TASK_TRACED > + * wakeups are under siglock. > + * > + * For any other case, a changed prev_state must be to > + * TASK_RUNNING, such that when it blocks, the load has > + * happened before the smp_mb(). > + * > + * Also see the comment with deactivate_task(). > + */ > + SCHED_WARN_ON(tmp_state && (prev_state & __TASK_TRACED && > + !(tmp_state & __TASK_TRACED))); > + > + prev_state = tmp_state; > + } > + > /* Promote REQ to ACT */ > rq->clock_update_flags <<= 1; > update_rq_clock(rq); > > switch_count = &prev->nivcsw; > - /* > - * We must re-load prev->state in case ttwu_remote() changed it > - * before we acquired rq->lock. > - */ > - if (!preempt && prev_state && prev_state == prev->state) { > + if (!preempt && prev_state) { > if (signal_pending_state(prev_state, prev)) { > prev->state = TASK_RUNNING; > } else {