čt 11. 6. 2026 v 12:31 odesílatel Valentin Schneider
<[email protected]> napsal:
> >
> > Isn't that precisely what the ipi tracepoints used by this
> > implementation (ipi:ipi_send_cpu) are for?
> >
>
> Well, these catch the emission of the IPI, which is great for investigation
> - slap a stacktrace trigger and you (most of the time) get the source of
> your interference.
>
> However Crystal's point is that on x86 (and I assume other archs) receiving
> & handling these IPIs is "special" and doesn't go through the generic irq
> subsystem and thus has to be tracked separately, which is why osnoise has
> this fairly lengthy osnoise_arch_register() thing.
>

Ah, right. This is not IPI specific, though, IIUC - Intel also has
other IRQs that have to be traced using Intel-specific trace points,
like irq_vectors:local_timer, which is also handled in
osnoise_arch_register(). On ARM from what I recall, most (all?) IRQs
are traced with irq:* tracepoints.

So there are two parts to this:

- Detecting interference from IPIs firing as osnoise:irq_noise (to be
analyzed by timerlat auto analysis, and also will appear by default in
trace output if enabled, regardless of the tool, as all osnoise:*
tracepoints are enabled there). This is done locally using the already
existing path (no race hazard), but requires arch-specific detection.

- Counting IPIs when they are being sent. This is the new feature, and
the count is being recorded in osnoise_sample.

I guess that means that if there were a generic IPI interface, it
would be easier to use that for IPI counting, as the event would be
CPU-local? As you say, for tracing of the IPI source, the sending
tracepoints are better, and that you can already dump the stack trace
of with --event/--trigger. timerlat auto-analysis could be extended to
connect the specific IPI to the IRQ noise and display its stack trace
automatically, instead of manually analyzing the trace output.

> >> Isn't this racy to do from a different CPU?  Both in terms of the
> >> counter, and the timing of the increment relative to when the IPI is
> >> actually received.  Not necessarily a huge deal if you only care about
> >> zero versus bignum, but still.  At least worth a comment, if we go with
> >> this approach.
> >>
> >
> > I also think it's a bit confusing, especially as the other accesses to
> > osn_var are cpu-local, but here, "cpu" is the *target* CPU, not the
> > current CPU. Not sure how expensive it would be to do atomic_add for
> > that, at least it's something to consider.
> >
>
> I suppose that could be an argument for doing that stat aggregation in
> userspace osnoise - event handlers are run after the fact via
> tracefs_iterate_raw_events(), it's all inherently slower since it's just
> increments of one (one per handled event) but it's also all done in
> userspace on a control thread and doesn't bog down the kernelspace.
>

You can also do per-cpu counters in-kernel and sum them in the end,
but that would take cpus^2 space (indexed by [current_cpu,
target_cpu]). The question is whether there could be enough samples to
overload sample collection (like it happens for timerlat, which
collects data in-kernel using BPF instead).

In-kernel counting can be tested with " --event ipi:ipi_send_cpu
--trigger hist:key=cpu" - IIRC, tracefs histograms use atomic
operations (via tracing_map) to protect the entries from races in
multi thread access. Of course, that is inferior to what the patchset
implements, as it doesn't record which osnoise cycle the IPI was sent
in, nor can record cpumask IPIs.


Tomas


Reply via email to