On Mon, Jul 20, 2020 at 12:59:24PM +0200, pet...@infradead.org wrote:
> On Mon, Jul 20, 2020 at 10:41:06AM +0200, Peter Zijlstra wrote:
> > On Mon, Jul 20, 2020 at 10:26:58AM +0200, Oleg Nesterov wrote:
> > > Peter,
> > > 
> > > Let me add another note. TASK_TRACED/TASK_STOPPED was always protected by
> > > ->siglock. In particular, ttwu(__TASK_TRACED) must be always called with
> > > ->siglock held. That is why ptrace_freeze_traced() assumes it can safely
> > > do s/TASK_TRACED/__TASK_TRACED/ under spin_lock(siglock).
> > > 
> > > Can this change race with
> > > 
> > >           if (signal_pending_state(prev->state, prev)) {
> > >                   prev->state = TASK_RUNNING;
> > >           }
> > > 
> > > in __schedule() ? Hopefully not, signal-state is protected by siglock too.
> > > 
> > > So I think this logic was correct even if it doesn't look nice. But 
> > > "doesn't
> > > look nice" is true for the whole ptrace code ;)
> > 
> > *groan*... another bit of obscure magic :-(
> > 
> > let me go try and wake up and figure out how best to deal with this.

This then? That seems to survive the strace thing.

---
 kernel/sched/core.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e15543cb84812..b5973d7fa521c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4100,9 +4100,9 @@ pick_next_task(struct rq *rq, struct task_struct *prev, 
struct rq_flags *rf)
  */
 static void __sched notrace __schedule(bool preempt)
 {
+       unsigned long prev_state, tmp_state;
        struct task_struct *prev, *next;
        unsigned long *switch_count;
-       unsigned long prev_state;
        struct rq_flags rf;
        struct rq *rq;
        int cpu;
@@ -4140,16 +4140,38 @@ static void __sched notrace __schedule(bool preempt)
        rq_lock(rq, &rf);
        smp_mb__after_spinlock();
 
+       /*
+        * We must re-load prev->state in case ttwu_remote() changed it
+        * before we acquired rq->lock.
+        */
+       tmp_state = prev->state;
+       if (unlikely(prev_state != tmp_state)) {
+               /*
+                * ptrace_{,un}freeze_traced() think it is cool to change
+                * ->state around behind our backs between TASK_TRACED and
+                *  __TASK_TRACED.
+                *
+                * This is safe because this, as well as any __TASK_TRACED
+                * wakeups are under siglock.
+                *
+                * For any other case, a changed prev_state must be to
+                * TASK_RUNNING, such that when it blocks, the load has
+                * happened before the smp_mb().
+                *
+                * Also see the comment with deactivate_task().
+                */
+               SCHED_WARN_ON(tmp_state && (prev_state & __TASK_TRACED &&
+                                          !(tmp_state & __TASK_TRACED)));
+
+               prev_state = tmp_state;
+       }
+
        /* Promote REQ to ACT */
        rq->clock_update_flags <<= 1;
        update_rq_clock(rq);
 
        switch_count = &prev->nivcsw;
-       /*
-        * We must re-load prev->state in case ttwu_remote() changed it
-        * before we acquired rq->lock.
-        */
-       if (!preempt && prev_state && prev_state == prev->state) {
+       if (!preempt && prev_state) {
                if (signal_pending_state(prev_state, prev)) {
                        prev->state = TASK_RUNNING;
                } else {

Reply via email to