On Wed, Jun 12, 2024 at 01:51:31PM +0200, Paolo Bonzini wrote:
> On Wed, Jun 12, 2024 at 1:38 PM Daniel P. Berrangé <berra...@redhat.com> 
> wrote:
> > If we want to use POPCNT in the TCG code, can we not do a runtime check
> > and selectively build pieces of code with  
> > __attribute__((target("popcnt"))),
> > as we've done historically for the bufferiszero.c code, rather than
> > changing the entire QEMU baseline ?
> 
> bufferiszero.c has a very quick check in front of the indirect call
> and runs for several hundred clock cycles, so the tradeoff is
> different there.
> 
> I guess that, because these helpers are called by TCG, you wouldn't
> pay the price of the indirect call. However, adding all this
> infrastructure for 13-15 year old CPUs is not very enthralling.

Ah, so the distinction is that the old code had a runtime check
on 'have_popcnt' (and similar), where as now that check is eliminated
at compile time, since the condition is a constant.

Rather than re-introducing a runtime check again for everyone, could
we make it a configure time argument whether to assume x86_64-v2 ?
So those who are happy with a increased baseline can achieve the
maximum performance with all checks eliminated at compile time,
while still allowing the tradeoff of a dynamic check for those who
prefer compatibility over peak perfr ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to