On Fri, 5 Sep 2025, Julian Ganz wrote:
September 5, 2025 at 1:38 PM, "BALATON Zoltan" wrote:
On Thu, 4 Sep 2025, Julian Ganz wrote:
 Even with the existing interfaces, it is more or less possible to
 discern these situations, e.g. as done by the cflow plugin. However,
 this process poses a considerable overhead to the core analysis one may
 intend to perform.

I'd rather have overhead in the plugin than in interrupt and exception
handling on every target unless this can be completely disabled
somehow when not needed to not pose any overhead on interrupt handling
in the guest.

The "more or less" is rather heavy here: with the current API there is
no way to distinguish between interrupts and exceptions. Double-traps
can probably only be detected if you don't rely on weird, very error
prone heuristics around TB translations.

And as Alex Benée pointed out, qemu can be easily built with plugins
disabled.

Have you done any testing on how much overhead this adds
to interrupt heavy guest workloads? At least for PPC these are already
much slower than real CPU so I'd like it to get faster not slower.

No, I have not made any performance measurements. However, given that
for every single TB execution a similar hook is called already, the
impact related to other existing plugin infrastructure _should_ be
neglectible.

That is, if your workload actually runs any code and is not constantly
bombarded with interrupts that _do_ result in a trap (which _may_ happen
during some tests).

So if you are performance sensitive enough to care, you will very likely
want to disable plugins anyway.

I can disable plugins and do that normally but that does not help those who get QEMU from their distro (i.e. most users). If this infrastructure was disabled in default builds and needed an explicit option to enable then those who need it could enable it and not imposed it on everyone else who just get a default build from a distro and never use plugins. Having an option which needs rebuild is like not having the option for most people. I guess the question is which is the larger group? Those who just run guests or those who use this instrumentation with plugins. The default may better be what the larger group needs. Even then distros may still change the default so it would be best if the overhead can be minimised even if enabled. I think the log infrastructure does that, would a similar solution work here?

For testing I've found that because embedded PPC CPUs have a software controlled MMU (and in addition to that QEMU may flush TLB entries too often) running something that does a lot of memory access like runnung the STREAM benchmark on sam460ex is hit by this IIRC but anything else causing a lot of interrupts like reading from emulated disk or sound is probably affected as well. I've tried to optimise PPC exception handling a bit before but whenever I optimise something it is later undone by other changes not caring about performance.

Regards,
BALATON Zoltan

Reply via email to