On Wed, 10 Sep 2025, Alex Bennée wrote:
"Julian Ganz" <[email protected]> writes:
September 10, 2025 at 12:06 PM, "BALATON Zoltan" wrote:
On Tue, 9 Sep 2025, Julian Ganz wrote:
I ran streamPPCpowerpcO3 on qemu with these patches:
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 2867.6 0.056828 0.055795 0.061792
Scale: 1057.5 0.153282 0.151305 0.158115
Add: 1308.8 0.187095 0.183380 0.193672
Triad: 1111.6 0.220863 0.215902 0.230440
-------------------------------------------------------------
After doing a clean build, with the fans still audible:
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 2932.9 0.055131 0.054554 0.055667
Scale: 1067.9 0.151520 0.149832 0.155000
Add: 1324.9 0.184807 0.181150 0.191386
Triad: 1122.0 0.220080 0.213896 0.229302
-------------------------------------------------------------
What was different between the above two runs? I guess maybe one is with
plugins disabled but it's not clear from the description.
The difference is nothing but a a clean rebuild of qemu. As you see
there are fluctuations already. Plugins are enabled for both cases.
On qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these
patches, but plugins enabled:
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 2972.1 0.054407 0.053834 0.054675
Scale: 1068.6 0.151503 0.149726 0.154594
Add: 1327.6 0.185160 0.180784 0.193181
Triad: 1127.2 0.219249 0.212915 0.229230
-------------------------------------------------------------
And on qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these
patches, with plugins disabled:
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 2983.4 0.055141 0.053630 0.060013
Scale: 1058.3 0.152353 0.151186 0.155072
Add: 1323.9 0.184707 0.181279 0.188868
Triad: 1128.2 0.218674 0.212734 0.230314
-------------------------------------------------------------
I fail to see any significant indication that these patches, or
plugins in general, would result in a degredation of performance.
With worst case Copy test it seems to be about 3.5% (and about 1.7%
with plugins disabled?) and should be less than that normally so it
does not add much more overhead to plugins than there is already so
this should be acceptable. It may still be interesting to see if the
overhead with plugins disabled can be avoided with a similar way as
logging does it.
The thing is: that's probably just usual fluctuations. As you have seen
with the first two measurements the values fluctuate quite a bit between
runs of the test on the very same qemu (assuming that a clean build did
not incur any _other_ relevant change). For example, the best rate for
scale shown with plugins enabled is one percent faster than with plugins
disabled. Is this significant? Probably not. Or at least it doesn't make
much sense.
I wouldn't spend too much time chasing this down. As you say this
fluctuation is well within the noise range.
I can recommend hyperfine as a runner:
https://github.com/sharkdp/hyperfine
as it does some work on how many times you need to run a test before the
results are statistically relevant.
I may do some more tests this week, with runtimes longer than a few
seconds if I can find the motivation to set up everything I'd need to
compile your benchmark. In the mean-time, you are welcome to make your
own measurements if you want to. The patches are also availible at [1]
if you don't want to apply them to your local tree yourself.
Balton,
I don't think worries about performance impact are justified and Julian
has certainly done enough due diligence here. If you can come up with a
repeatable test that shows a measurable impact then please do so.
I agree this testing is enough to ensure there is no big impact. I just
wanted to make sure there is some testing and not just adding stuff
without worrying about performance. I'd like to keep QEMU quick and only
add unavoidable overhead where possible but I don't demand to spend too
much time on that. If Julian got interested and does more testing that may
give some interesting results for possible optimisation but if no time for
that this was enough to measure the impact for this series.
Regards,
BALATON Zoltan
Regards,
Julian
[1]:
https://github.com/patchew-project/qemu/tree/patchew/[email protected]