On 2025-04-25, John Ogness <[email protected]> wrote: >>> perf stat -e rv:error_sleep stress-ng --cpu-sched 1 -t 10s >>> -- shows several errors -- >> >> This one is a monitor's bug. >> >> The monitor mistakenly sees the task getting woken up, *then* sees it going >> to sleep. >> >> This is due to trace_sched_switch() being called with a stale 'prev_state'. >> 'prev_state' is read at the beginning of __schedule(), but >> trace_sched_switch() is invoked a bit later. Therefore if task->__state is >> changed inbetween, 'prev_state' is not the value of task->__state. >> >> The monitor checks (prev_state & TASK_INTERRUPTIBLE) to determine if the >> task is going to sleep. This can be incorrect due to the race above. The >> monitor sees the task going to sleep, but actually it is just preempted. > > If I understand this correctly, trace_sched_switch() is reporting > accurate state transition information, but by the time it is reported > that state may have already changed (in which case another > trace_sched_switch() occurs later). > > So in this example, the task did go to sleep. Why do you think it was > preempted instead?
On 2025-04-25, Gabriele Monaco <[email protected]> wrote: > Peter's fix [1] landed on next recently, I guess in a couple of days > you'll get it on the upstream tree and you may not see the problem. Ah, thanks for pointing that out! > [1] - > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8feb053d53194382fcfb68231296fdc220497ea6
