On 2025-04-25, Nam Cao <[email protected]> wrote: > On Thu, Apr 24, 2025 at 03:55:34PM +0200, Gabriele Monaco wrote: >> I've been playing with these monitors, code-wise they look good. >> I tested a bit and they seem to work without many surprises by doing >> something as simple as: >> >> perf stat -e rv:error_sleep stress-ng --cpu-sched 1 -t 10s >> -- shows several errors -- > > This one is a monitor's bug. > > The monitor mistakenly sees the task getting woken up, *then* sees it going > to sleep. > > This is due to trace_sched_switch() being called with a stale 'prev_state'. > 'prev_state' is read at the beginning of __schedule(), but > trace_sched_switch() is invoked a bit later. Therefore if task->__state is > changed inbetween, 'prev_state' is not the value of task->__state. > > The monitor checks (prev_state & TASK_INTERRUPTIBLE) to determine if the > task is going to sleep. This can be incorrect due to the race above. The > monitor sees the task going to sleep, but actually it is just preempted.
If I understand this correctly, trace_sched_switch() is reporting accurate state transition information, but by the time it is reported that state may have already changed (in which case another trace_sched_switch() occurs later). So in this example, the task did go to sleep. Why do you think it was preempted instead? John Ogness
