On Wed, Jun 12, 2024 at 01:51:31PM +0200, Paolo Bonzini wrote: > On Wed, Jun 12, 2024 at 1:38 PM Daniel P. Berrangé <berra...@redhat.com> > wrote: > > If we want to use POPCNT in the TCG code, can we not do a runtime check > > and selectively build pieces of code with > > __attribute__((target("popcnt"))), > > as we've done historically for the bufferiszero.c code, rather than > > changing the entire QEMU baseline ? > > bufferiszero.c has a very quick check in front of the indirect call > and runs for several hundred clock cycles, so the tradeoff is > different there. > > I guess that, because these helpers are called by TCG, you wouldn't > pay the price of the indirect call. However, adding all this > infrastructure for 13-15 year old CPUs is not very enthralling.
Ah, so the distinction is that the old code had a runtime check on 'have_popcnt' (and similar), where as now that check is eliminated at compile time, since the condition is a constant. Rather than re-introducing a runtime check again for everyone, could we make it a configure time argument whether to assume x86_64-v2 ? So those who are happy with a increased baseline can achieve the maximum performance with all checks eliminated at compile time, while still allowing the tradeoff of a dynamic check for those who prefer compatibility over peak perfr ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|