On Wed, 10 Sep 2025, Alex Bennée wrote:
"Julian Ganz" <[email protected]> writes:
September 10, 2025 at 12:06 PM, "BALATON Zoltan" wrote:
On Tue, 9 Sep 2025, Julian Ganz wrote:
I ran streamPPCpowerpcO3 on qemu with these patches:

 -------------------------------------------------------------
 Function Best Rate MB/s Avg time Min time Max time
 Copy: 2867.6 0.056828 0.055795 0.061792
 Scale: 1057.5 0.153282 0.151305 0.158115
 Add: 1308.8 0.187095 0.183380 0.193672
 Triad: 1111.6 0.220863 0.215902 0.230440
 -------------------------------------------------------------

 After doing a clean build, with the fans still audible:

 -------------------------------------------------------------
 Function Best Rate MB/s Avg time Min time Max time
 Copy: 2932.9 0.055131 0.054554 0.055667
 Scale: 1067.9 0.151520 0.149832 0.155000
 Add: 1324.9 0.184807 0.181150 0.191386
 Triad: 1122.0 0.220080 0.213896 0.229302
 -------------------------------------------------------------

What was different between the above two runs? I guess maybe one is with 
plugins disabled but it's not clear from the description.

The difference is nothing but a a clean rebuild of qemu. As you see
there are fluctuations already. Plugins are enabled for both cases.

On qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these
 patches, but plugins enabled:

 -------------------------------------------------------------
 Function Best Rate MB/s Avg time Min time Max time
 Copy: 2972.1 0.054407 0.053834 0.054675
 Scale: 1068.6 0.151503 0.149726 0.154594
 Add: 1327.6 0.185160 0.180784 0.193181
 Triad: 1127.2 0.219249 0.212915 0.229230
 -------------------------------------------------------------

 And on qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these
 patches, with plugins disabled:

 -------------------------------------------------------------
 Function Best Rate MB/s Avg time Min time Max time
 Copy: 2983.4 0.055141 0.053630 0.060013
 Scale: 1058.3 0.152353 0.151186 0.155072
 Add: 1323.9 0.184707 0.181279 0.188868
 Triad: 1128.2 0.218674 0.212734 0.230314
 -------------------------------------------------------------

 I fail to see any significant indication that these patches, or
 plugins in general, would result in a degredation of performance.

With worst case Copy test it seems to be about 3.5% (and about 1.7%
with plugins disabled?) and should be less than that normally so it
does not add much more overhead to plugins than there is already so
this should be acceptable. It may still be interesting to see if the
overhead with plugins disabled can be avoided with a similar way as
logging does it.

The thing is: that's probably just usual fluctuations. As you have seen
with the first two measurements the values fluctuate quite a bit between
runs of the test on the very same qemu (assuming that a clean build did
not incur any _other_ relevant change). For example, the best rate for
scale shown with plugins enabled is one percent faster than with plugins
disabled. Is this significant? Probably not. Or at least it doesn't make
much sense.

I wouldn't spend too much time chasing this down. As you say this
fluctuation is well within the noise range.

I can recommend hyperfine as a runner:

 https://github.com/sharkdp/hyperfine

as it does some work on how many times you need to run a test before the
results are statistically relevant.

I may do some more tests this week, with runtimes longer than a few
seconds if I can find the motivation to set up everything I'd need to
compile your benchmark. In the mean-time, you are welcome to make your
own measurements if you want to. The patches are also availible at [1]
if you don't want to apply them to your local tree yourself.

Balton,

I don't think worries about performance impact are justified and Julian
has certainly done enough due diligence here. If you can come up with a
repeatable test that shows a measurable impact then please do so.

I agree this testing is enough to ensure there is no big impact. I just wanted to make sure there is some testing and not just adding stuff without worrying about performance. I'd like to keep QEMU quick and only add unavoidable overhead where possible but I don't demand to spend too much time on that. If Julian got interested and does more testing that may give some interesting results for possible optimisation but if no time for that this was enough to measure the impact for this series.

Regards,
BALATON Zoltan

Regards,
Julian

[1]: 
https://github.com/patchew-project/qemu/tree/patchew/[email protected]

Reply via email to