Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-23 Thread Richard Henderson

On 6/23/24 14:27, Alexander Monakov wrote:

Hello,

On Wed, 12 Jun 2024, Paolo Bonzini wrote:


I didn't do this because of RHEL9, I did it because it's silly that
QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
compute the x86 parity flag (and POPCNT was introduced at the same
time as SSE4.2).


I do not see where the 2% figure is coming from: even considering that
the 256-byte LUT may take an extra cache line due to misalignment, 320
bytes is still less than 1% of 32KB L1D size.

More importantly, the way this comment is phrased made me think that Qemu
eagerly computes PF. But the comment in target/i386/cpu.h is saying that
all flags are computed in an on-demand manner. Considering that software
pretty much never uses PF, why would the parity table be resident in L1D?
As far as I can see, the cost is rather a cache miss and perhaps a TLB miss
when PF is computed (mostly when EFLAGS are accessed all together on
context switches I think).

Is there something I'm not seeing?


We delay flags computation until they're needed (since flags are often overwritten by the 
very next instruction), but when we do, we compute all of the flags.  So PF is computed at 
that point, even if PF itself will never be read.



r~



Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-23 Thread Alexander Monakov
Hello,

On Wed, 12 Jun 2024, Paolo Bonzini wrote:

> I didn't do this because of RHEL9, I did it because it's silly that
> QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
> compute the x86 parity flag (and POPCNT was introduced at the same
> time as SSE4.2).

I do not see where the 2% figure is coming from: even considering that
the 256-byte LUT may take an extra cache line due to misalignment, 320
bytes is still less than 1% of 32KB L1D size.

More importantly, the way this comment is phrased made me think that Qemu
eagerly computes PF. But the comment in target/i386/cpu.h is saying that
all flags are computed in an on-demand manner. Considering that software
pretty much never uses PF, why would the parity table be resident in L1D?
As far as I can see, the cost is rather a cache miss and perhaps a TLB miss
when PF is computed (mostly when EFLAGS are accessed all together on
context switches I think).

Is there something I'm not seeing?

Thanks.
Alexander



Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Paolo Bonzini
On Wed, Jun 12, 2024 at 7:00 PM Daniel P. Berrangé  wrote:
> > I guess that, because these helpers are called by TCG, you wouldn't
> > pay the price of the indirect call. However, adding all this
> > infrastructure for 13-15 year old CPUs is not very enthralling.
>
> Rather than re-introducing a runtime check again for everyone, could
> we make it a configure time argument whether to assume x86_64-v2 ?

Fair enough, I'll work on it.

Paolo




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Daniel P . Berrangé
On Wed, Jun 12, 2024 at 04:09:29PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 12, 2024 at 01:21:26PM +0100, Daniel P. Berrangé wrote:
> > On Wed, Jun 12, 2024 at 01:51:31PM +0200, Paolo Bonzini wrote:
> > > On Wed, Jun 12, 2024 at 1:38 PM Daniel P. Berrangé  
> > > wrote:
> > > > This isn't anything to do with the distro installer. The use case is 
> > > > that
> > > > the distro wants all its software to be able to run on the x86_64 
> > > > baseline
> > > > it has chosen to build with.
> > > 
> > > Sure, and they can patch the packages if their wish is not shared by
> > > upstream. Alternatively they can live with the fact that not all users
> > > will be able to use all packages, which is probably already the case.
> > 
> > Yep, there's almost certainly scientific packages that have done
> > optimizations in their builds. QEMU is slightly more special
> > though because it is classed as a "critical path" package for
> > the distro. Even the QEMU linux-user pieces are now critical path,
> > since they're leveraged by docker & podman for running foreign arch
> > containers.
> > 
> > > Or drop QEMU, I guess. Has FeSCO ever expressed how strict they are
> > > and which of the three options they'd pick?
> > 
> > I don't know - i'm going to raise this question to find out if
> > there's any guidance.
> 
> I learnt that FESCo approved a surprisingly loose rule saying
> 
>   "Libraries packaged in Fedora may require ISA extensions,
>however any packaged application must not crash on any
>officially supported architecture, either by providing
>a generic fallback implementation OR by cleanly exiting
>when the requisite hardware support is unavailable."
>

..snip..

I queried the looseness of this wording, and it is suggested
it wasn't intended to apply to existing packages, just newly
added ones. By that interpretation it wouldn't be valid for
QEMU, and we'd be pushed towards the revert downstream, to
retain a runtime check for the feature. I really hate the
idea of keeping a revert of these patches downstream though,
as it would be an indefinite rebase headache.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Daniel P . Berrangé
On Wed, Jun 12, 2024 at 01:51:31PM +0200, Paolo Bonzini wrote:
> On Wed, Jun 12, 2024 at 1:38 PM Daniel P. Berrangé  
> wrote:
> > If we want to use POPCNT in the TCG code, can we not do a runtime check
> > and selectively build pieces of code with  
> > __attribute__((target("popcnt"))),
> > as we've done historically for the bufferiszero.c code, rather than
> > changing the entire QEMU baseline ?
> 
> bufferiszero.c has a very quick check in front of the indirect call
> and runs for several hundred clock cycles, so the tradeoff is
> different there.
> 
> I guess that, because these helpers are called by TCG, you wouldn't
> pay the price of the indirect call. However, adding all this
> infrastructure for 13-15 year old CPUs is not very enthralling.

Ah, so the distinction is that the old code had a runtime check
on 'have_popcnt' (and similar), where as now that check is eliminated
at compile time, since the condition is a constant.

Rather than re-introducing a runtime check again for everyone, could
we make it a configure time argument whether to assume x86_64-v2 ?
So those who are happy with a increased baseline can achieve the
maximum performance with all checks eliminated at compile time,
while still allowing the tradeoff of a dynamic check for those who
prefer compatibility over peak perfr ?

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Daniel P . Berrangé
On Wed, Jun 12, 2024 at 06:40:09PM +0300, Alexander Monakov wrote:
> 
> On Wed, 12 Jun 2024, Daniel P. Berrangé wrote:
> 
> > I learnt that FESCo approved a surprisingly loose rule saying
> > 
> >   "Libraries packaged in Fedora may require ISA extensions,
> >however any packaged application must not crash on any
> >officially supported architecture, either by providing
> >a generic fallback implementation OR by cleanly exiting
> >when the requisite hardware support is unavailable."
> > 
> > This might suggest we could put a runtime feature check in main(),
> > print a warning and then exit(1), however, QEMU has alot of code
> > that is triggered from ELF constructors. If we're building the
> > entire of QEMU codebase with extra features enabled, I worry that
> > the constructors could potentially cause a illegal instruction
> > crash before main() runs ?
> 
> Are you literally suggesting to find a solution that satisfies the letter
> of Fedora rules, and not what's good for the spirit of a wider community.

I'm interested in exploring what the options are. Personally I still
think QEMU ought to maintain compat with the original x86_64 ABI, since
very few distros have moved to requiring -v2, but if that doesn't happen
I want to understand the implications for Fedora since that's where I'm
a maintainer.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov

On Wed, 12 Jun 2024, Daniel P. Berrangé wrote:

> I learnt that FESCo approved a surprisingly loose rule saying
> 
>   "Libraries packaged in Fedora may require ISA extensions,
>however any packaged application must not crash on any
>officially supported architecture, either by providing
>a generic fallback implementation OR by cleanly exiting
>when the requisite hardware support is unavailable."
> 
> This might suggest we could put a runtime feature check in main(),
> print a warning and then exit(1), however, QEMU has alot of code
> that is triggered from ELF constructors. If we're building the
> entire of QEMU codebase with extra features enabled, I worry that
> the constructors could potentially cause a illegal instruction
> crash before main() runs ?

Are you literally suggesting to find a solution that satisfies the letter
of Fedora rules, and not what's good for the spirit of a wider community.

Alexander

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Paolo Bonzini
On Wed, Jun 12, 2024 at 5:09 PM Daniel P. Berrangé  wrote:
> This might suggest we could put a runtime feature check in main(),
> print a warning and then exit(1), however, QEMU has alot of code
> that is triggered from ELF constructors. If we're building the
> entire of QEMU codebase with extra features enabled, I worry that
> the constructors could potentially cause a illegal instruction
> crash before main() runs ?

And I learnt that one can simply add -mneeded to the compiler command
line to achieve that, at least on glibc systems:

$ gcc f.c -mneeded -mpopcnt
$ qemu-x86_64 -cpu core2duo ./a.out
./a.out: CPU ISA level is lower than required
$ qemu-x86_64 ./a.out
1234

$ gcc f.c -mneeded
$ qemu-x86_64 -cpu core2duo ./a.out
1234

Using "readelf -n" on the executable unveils the magic:

Displaying notes found in: .note.gnu.property
  OwnerData size Description
  GNU  0x0030NT_GNU_PROPERTY_TYPE_0
  Properties: x86 ISA needed: x86-64-baseline, x86-64-v2
x86 feature used: x86
x86 ISA used:

I'm actually amazed. :)

Paolo




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Daniel P . Berrangé
On Wed, Jun 12, 2024 at 01:21:26PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 12, 2024 at 01:51:31PM +0200, Paolo Bonzini wrote:
> > On Wed, Jun 12, 2024 at 1:38 PM Daniel P. Berrangé  
> > wrote:
> > > This isn't anything to do with the distro installer. The use case is that
> > > the distro wants all its software to be able to run on the x86_64 baseline
> > > it has chosen to build with.
> > 
> > Sure, and they can patch the packages if their wish is not shared by
> > upstream. Alternatively they can live with the fact that not all users
> > will be able to use all packages, which is probably already the case.
> 
> Yep, there's almost certainly scientific packages that have done
> optimizations in their builds. QEMU is slightly more special
> though because it is classed as a "critical path" package for
> the distro. Even the QEMU linux-user pieces are now critical path,
> since they're leveraged by docker & podman for running foreign arch
> containers.
> 
> > Or drop QEMU, I guess. Has FeSCO ever expressed how strict they are
> > and which of the three options they'd pick?
> 
> I don't know - i'm going to raise this question to find out if
> there's any guidance.

I learnt that FESCo approved a surprisingly loose rule saying

  "Libraries packaged in Fedora may require ISA extensions,
   however any packaged application must not crash on any
   officially supported architecture, either by providing
   a generic fallback implementation OR by cleanly exiting
   when the requisite hardware support is unavailable."

This might suggest we could put a runtime feature check in main(),
print a warning and then exit(1), however, QEMU has alot of code
that is triggered from ELF constructors. If we're building the
entire of QEMU codebase with extra features enabled, I worry that
the constructors could potentially cause a illegal instruction
crash before main() runs ?

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov

On Wed, 12 Jun 2024, Paolo Bonzini wrote:

> On Wed, Jun 12, 2024 at 3:34 PM Alexander Monakov  wrote:
> > On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> > > > I found out from the mailing list. My Core2-based desktop would be 
> > > > affected.
> > >
> > > Do you run QEMU on it? With KVM or TCG?
> >
> > Excuse me? Are you going to ask for SSH access to ensure my computer really
> > exists and is in working order?
> 
> Come on. The thing is, I'm not debating the existence of computers
> that don't have x86_64-v2, but I *am* debating the usefulness of
> making QEMU run on them and any extra information can be interesting.

I think it will be useful to me, with KVM and TCG both.

> > Can you tell me why you never commented on buffer_is_zero improvements, 
> > where
> > v1 was sent in October?  Just trying to understand how you care for 2% of 
> > L1D
> > use but could be ok with those kinds of speedups be dropped on the floor.
> 
> I'm not sure if there is any overlap in the scenarios where
> buffer_is_zero performance matters, and x86 emulation. People can care
> about thing A but not thing B. If there's anything that you think I
> can help reviewing, feel free to let me know offlist.

In that case I would've appreciated an early indication you're not interested,
making Cc'ing you on followups unnecessary.

Alexander

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Paolo Bonzini
On Wed, Jun 12, 2024 at 3:34 PM Alexander Monakov  wrote:
> On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> > > I found out from the mailing list. My Core2-based desktop would be 
> > > affected.
> >
> > Do you run QEMU on it? With KVM or TCG?
>
> Excuse me? Are you going to ask for SSH access to ensure my computer really
> exists and is in working order?

Come on. The thing is, I'm not debating the existence of computers
that don't have x86_64-v2, but I *am* debating the usefulness of
making QEMU run on them and any extra information can be interesting.

> Can you tell me why you never commented on buffer_is_zero improvements, where
> v1 was sent in October?  Just trying to understand how you care for 2% of L1D
> use but could be ok with those kinds of speedups be dropped on the floor.

I'm not sure if there is any overlap in the scenarios where
buffer_is_zero performance matters, and x86 emulation. People can care
about thing A but not thing B. If there's anything that you think I
can help reviewing, feel free to let me know offlist.

Paolo




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov


On Wed, 12 Jun 2024, Paolo Bonzini wrote:

> > I found out from the mailing list. My Core2-based desktop would be affected.
> 
> Do you run QEMU on it? With KVM or TCG?

Excuse me? Are you going to ask for SSH access to ensure my computer really
exists and is in working order?

Can you tell me why you never commented on buffer_is_zero improvements, where
v1 was sent in October?  Just trying to understand how you care for 2% of L1D
use but could be ok with those kinds of speedups be dropped on the floor.

Alexander



Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Daniel P . Berrangé
On Wed, Jun 12, 2024 at 01:51:31PM +0200, Paolo Bonzini wrote:
> On Wed, Jun 12, 2024 at 1:38 PM Daniel P. Berrangé  
> wrote:
> > This isn't anything to do with the distro installer. The use case is that
> > the distro wants all its software to be able to run on the x86_64 baseline
> > it has chosen to build with.
> 
> Sure, and they can patch the packages if their wish is not shared by
> upstream. Alternatively they can live with the fact that not all users
> will be able to use all packages, which is probably already the case.

Yep, there's almost certainly scientific packages that have done
optimizations in their builds. QEMU is slightly more special
though because it is classed as a "critical path" package for
the distro. Even the QEMU linux-user pieces are now critical path,
since they're leveraged by docker & podman for running foreign arch
containers.

> Or drop QEMU, I guess. Has FeSCO ever expressed how strict they are
> and which of the three options they'd pick?

I don't know - i'm going to raise this question to find out if
there's any guidance.

> Either way, this only affects either the QEMU maintainers for the
> distro, or the users of QEMU. It's only if the installation media used
> QEMU, that this change would be actively blocking usage of the distro
> on old processors.



With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Paolo Bonzini
On Wed, Jun 12, 2024 at 2:11 PM Alexander Monakov  wrote:
>
>
> On Wed, 12 Jun 2024, Paolo Bonzini wrote:
>
> > Ahah, nice. :) I'm pretty sure that, when I tested "pf =
> > (__builtin_popcount(x) & 1) * 4;", it was generating a call to
> > __builtin_popcountsi2.
>
> Why write '__builtin_popcount(x) & 1' when you can write
> '__builtin_parity(x)' in the first place?

I don't remember. :) Anhow, probably I will add __builtin_parity() to
include/qemu/host-utils.h and some kind of #ifdef HAVE_FAST_CTPOP.
Thanks.

> > Still - for something that has a code generator, there _is_ a cost in
> > supporting old CPUs, so I'd rather avoid reverting this. The glibc bug
> > that you linked is very different not just because it affected 32-bit
> > installation media, but also because it was a bug rather than
> > intentional.
> >
> > Since you are reporting this issue, how did you find out / what broke for 
> > you?
>
> I found out from the mailing list. My Core2-based desktop would be affected.

Do you run QEMU on it? With KVM or TCG?

Paolo




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov


On Wed, 12 Jun 2024, Paolo Bonzini wrote:

> Ahah, nice. :) I'm pretty sure that, when I tested "pf =
> (__builtin_popcount(x) & 1) * 4;", it was generating a call to
> __builtin_popcountsi2.

Why write '__builtin_popcount(x) & 1' when you can write
'__builtin_parity(x)' in the first place? 

> Still - for something that has a code generator, there _is_ a cost in
> supporting old CPUs, so I'd rather avoid reverting this. The glibc bug
> that you linked is very different not just because it affected 32-bit
> installation media, but also because it was a bug rather than
> intentional.
> 
> Since you are reporting this issue, how did you find out / what broke for you?

I found out from the mailing list. My Core2-based desktop would be affected.

Last but not the least, I'm sympathetic to the efforts of my distro maintainers,
who I imagine would be put in an uncomfortable position by this change.

Alexander



Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Paolo Bonzini
On Wed, Jun 12, 2024 at 1:46 PM Alexander Monakov  wrote:
>
>
> On Wed, 12 Jun 2024, Paolo Bonzini wrote:
>
> > On Wed, Jun 12, 2024 at 1:19 PM Alexander Monakov  
> > wrote:
> > > On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> > > > I didn't do this because of RHEL9, I did it because it's silly that
> > > > QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
> > > > compute the x86 parity flag (and POPCNT was introduced at the same
> > > > time as SSE4.2).
> > >
> > > From looking at that POPCNT patch I understood that Qemu detects
> > > presence of POPCNT at runtime and will only use the fallback when
> > > POPCNT is unavailable. Did I misunderstand?
> >
> > -mpopcnt allows GCC to generate the POPCNT instruction for helper
> > code. Right now we have code like this in
> > target/i386/tcg/cc_helper_template.h:
> >
> > pf = parity_table[(uint8_t)dst];
> >
> > and it could be instead something like
> >
> > #if defined __i386__ || defined __x86_64__ || defined __s390x__||
> > defined __riscv_zbb
>
> GCC also predefines __POPCNT__ when -mpopcnt is active, so that would be
> available for ifdef testing like above, but...
>
> > static inline unsigned int compute_pf(uint8_t x)
> > {
> > return __builtin_parity(x) * CC_P;
> > }
> > #else
> > extern const uint8_t parity_table[256];
> > static inline unsigned int compute_pf(uint8_t x)
> > {
> > return parity_table[x];
> > }
> > #endif
> >
> > The code generated for __builtin_parity, if you don't have it
> > available in hardware, is pretty bad.
>
> On x86 parity _is_ available in baseline ISA, no? Here's what gcc-14 
> generates:
>
> xor eax, eax
> testdil, dil
> setnp   al
> sal eax, 2

Ahah, nice. :) I'm pretty sure that, when I tested "pf =
(__builtin_popcount(x) & 1) * 4;", it was generating a call to
__builtin_popcountsi2.

Still - for something that has a code generator, there _is_ a cost in
supporting old CPUs, so I'd rather avoid reverting this. The glibc bug
that you linked is very different not just because it affected 32-bit
installation media, but also because it was a bug rather than
intentional.

Since you are reporting this issue, how did you find out / what broke for you?

Paolo




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Paolo Bonzini
On Wed, Jun 12, 2024 at 1:38 PM Daniel P. Berrangé  wrote:
> This isn't anything to do with the distro installer. The use case is that
> the distro wants all its software to be able to run on the x86_64 baseline
> it has chosen to build with.

Sure, and they can patch the packages if their wish is not shared by
upstream. Alternatively they can live with the fact that not all users
will be able to use all packages, which is probably already the case.
Or drop QEMU, I guess. Has FeSCO ever expressed how strict they are
and which of the three options they'd pick?

Either way, this only affects either the QEMU maintainers for the
distro, or the users of QEMU. It's only if the installation media used
QEMU, that this change would be actively blocking usage of the distro
on old processors.

> If we want to use POPCNT in the TCG code, can we not do a runtime check
> and selectively build pieces of code with  __attribute__((target("popcnt"))),
> as we've done historically for the bufferiszero.c code, rather than
> changing the entire QEMU baseline ?

bufferiszero.c has a very quick check in front of the indirect call
and runs for several hundred clock cycles, so the tradeoff is
different there.

I guess that, because these helpers are called by TCG, you wouldn't
pay the price of the indirect call. However, adding all this
infrastructure for 13-15 year old CPUs is not very enthralling.

Paolo




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov

On Wed, 12 Jun 2024, Paolo Bonzini wrote:

> On Wed, Jun 12, 2024 at 1:19 PM Alexander Monakov  wrote:
> > On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> > > I didn't do this because of RHEL9, I did it because it's silly that
> > > QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
> > > compute the x86 parity flag (and POPCNT was introduced at the same
> > > time as SSE4.2).
> >
> > From looking at that POPCNT patch I understood that Qemu detects
> > presence of POPCNT at runtime and will only use the fallback when
> > POPCNT is unavailable. Did I misunderstand?
> 
> -mpopcnt allows GCC to generate the POPCNT instruction for helper
> code. Right now we have code like this in
> target/i386/tcg/cc_helper_template.h:
> 
> pf = parity_table[(uint8_t)dst];
> 
> and it could be instead something like
> 
> #if defined __i386__ || defined __x86_64__ || defined __s390x__||
> defined __riscv_zbb

GCC also predefines __POPCNT__ when -mpopcnt is active, so that would be
available for ifdef testing like above, but...

> static inline unsigned int compute_pf(uint8_t x)
> {
> return __builtin_parity(x) * CC_P;
> }
> #else
> extern const uint8_t parity_table[256];
> static inline unsigned int compute_pf(uint8_t x)
> {
> return parity_table[x];
> }
> #endif
> 
> The code generated for __builtin_parity, if you don't have it
> available in hardware, is pretty bad.

On x86 parity _is_ available in baseline ISA, no? Here's what gcc-14 generates:

xor eax, eax
testdil, dil
setnp   al
sal eax, 2

and with -mpopcnt:

movsx   eax, dil
popcnt  eax, eax
and eax, 1
sal eax, 2

Alexander

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Daniel P . Berrangé
On Wed, Jun 12, 2024 at 01:12:43PM +0200, Paolo Bonzini wrote:
> On Wed, Jun 12, 2024 at 1:04 PM Daniel P. Berrangé  
> wrote:
> >
> > On Wed, Jun 12, 2024 at 01:55:20PM +0300, Alexander Monakov wrote:
> > > Hello,
> > >
> > > I'm sending straightforward reverts to recent patches that bumped minimum
> > > required x86 instruction set to SSE4.2. The older chips did not stop 
> > > working,
> > > and people still test and use new software on older hardware:
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=31867
> > >
> > > Considering the very minor gains from the baseline raise, I'm honestly not
> > > sure why it happened. It seems better to let distributions handle that.
> >
> > Indeed distros are opinionated about the x86_64 baseline they want
> > to target.
> >
> > While RHEL-9 switched to a x86_64-v2 baseline, Fedora has repeatedly
> > rejected the idea of moving to an x86_64-v2 baseline, wanting to retain
> > full backwards compat. So this assumption in QEMU is preventing the
> > distros from satisfying their chosen build target goals.
> 
> I didn't do this because of RHEL9, I did it because it's silly that
> QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
> compute the x86 parity flag (and POPCNT was introduced at the same
> time as SSE4.2).
> 
> Intel x86_64-v2 processors have been around for about 15 years, AMD
> for a little less (2011). I'd rather hear from users about the
> usecases for running QEMU on such old processors before reverting, as
> this does not get in the way of booting/installing distros on old
> machines. Unless QEMU is run from within the installation media, which
> it isn't, requiring a particular processor family does not prevent
> Fedora from being installable on pre-v2 processors.

This isn't anything to do with the distro installer. The use case is that
the distro wants all its software to be able to run on the x86_64 baseline
it has chosen to build with.

If we want to use POPCNT in the TCG code, can we not do a runtime check
and selectively build pieces of code with  __attribute__((target("popcnt"))),
as we've done historically for the bufferiszero.c code, rather than
changing the entire QEMU baseline ?


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Paolo Bonzini
On Wed, Jun 12, 2024 at 1:19 PM Alexander Monakov  wrote:
> On Wed, 12 Jun 2024, Paolo Bonzini wrote:
> > I didn't do this because of RHEL9, I did it because it's silly that
> > QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
> > compute the x86 parity flag (and POPCNT was introduced at the same
> > time as SSE4.2).
>
> From looking at that POPCNT patch I understood that Qemu detects
> presence of POPCNT at runtime and will only use the fallback when
> POPCNT is unavailable. Did I misunderstand?

-mpopcnt allows GCC to generate the POPCNT instruction for helper
code. Right now we have code like this in
target/i386/tcg/cc_helper_template.h:

pf = parity_table[(uint8_t)dst];

and it could be instead something like

#if defined __i386__ || defined __x86_64__ || defined __s390x__||
defined __riscv_zbb
static inline unsigned int compute_pf(uint8_t x)
{
return __builtin_parity(x) * CC_P;
}
#else
extern const uint8_t parity_table[256];
static inline unsigned int compute_pf(uint8_t x)
{
return parity_table[x];
}
#endif

The code generated for __builtin_parity, if you don't have it
available in hardware, is pretty bad.

Paolo




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov


On Wed, 12 Jun 2024, Paolo Bonzini wrote:

> I didn't do this because of RHEL9, I did it because it's silly that
> QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
> compute the x86 parity flag (and POPCNT was introduced at the same
> time as SSE4.2).

>From looking at that POPCNT patch I understood that Qemu detects
presence of POPCNT at runtime and will only use the fallback when
POPCNT is unavailable. Did I misunderstand?

Alexander



Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov

On Wed, 12 Jun 2024, Daniel P. Berrangé wrote:

> On Wed, Jun 12, 2024 at 01:55:20PM +0300, Alexander Monakov wrote:
> > Hello,
> > 
> > I'm sending straightforward reverts to recent patches that bumped minimum
> > required x86 instruction set to SSE4.2. The older chips did not stop 
> > working,
> > and people still test and use new software on older hardware:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=31867
> > 
> > Considering the very minor gains from the baseline raise, I'm honestly not
> > sure why it happened. It seems better to let distributions handle that.
> 
> Indeed distros are opinionated about the x86_64 baseline they want
> to target.
> 
> While RHEL-9 switched to a x86_64-v2 baseline, Fedora has repeatedly
> rejected the idea of moving to an x86_64-v2 baseline, wanting to retain
> full backwards compat. So this assumption in QEMU is preventing the
> distros from satisfying their chosen build target goals.

So, to make sure I parsed that correctly, you're in support of the reverts?

Alexander

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Paolo Bonzini
On Wed, Jun 12, 2024 at 1:04 PM Daniel P. Berrangé  wrote:
>
> On Wed, Jun 12, 2024 at 01:55:20PM +0300, Alexander Monakov wrote:
> > Hello,
> >
> > I'm sending straightforward reverts to recent patches that bumped minimum
> > required x86 instruction set to SSE4.2. The older chips did not stop 
> > working,
> > and people still test and use new software on older hardware:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=31867
> >
> > Considering the very minor gains from the baseline raise, I'm honestly not
> > sure why it happened. It seems better to let distributions handle that.
>
> Indeed distros are opinionated about the x86_64 baseline they want
> to target.
>
> While RHEL-9 switched to a x86_64-v2 baseline, Fedora has repeatedly
> rejected the idea of moving to an x86_64-v2 baseline, wanting to retain
> full backwards compat. So this assumption in QEMU is preventing the
> distros from satisfying their chosen build target goals.

I didn't do this because of RHEL9, I did it because it's silly that
QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
compute the x86 parity flag (and POPCNT was introduced at the same
time as SSE4.2).

Intel x86_64-v2 processors have been around for about 15 years, AMD
for a little less (2011). I'd rather hear from users about the
usecases for running QEMU on such old processors before reverting, as
this does not get in the way of booting/installing distros on old
machines. Unless QEMU is run from within the installation media, which
it isn't, requiring a particular processor family does not prevent
Fedora from being installable on pre-v2 processors.

Paolo




Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Daniel P . Berrangé
On Wed, Jun 12, 2024 at 01:55:20PM +0300, Alexander Monakov wrote:
> Hello,
> 
> I'm sending straightforward reverts to recent patches that bumped minimum
> required x86 instruction set to SSE4.2. The older chips did not stop working,
> and people still test and use new software on older hardware:
> https://sourceware.org/bugzilla/show_bug.cgi?id=31867
> 
> Considering the very minor gains from the baseline raise, I'm honestly not
> sure why it happened. It seems better to let distributions handle that.

Indeed distros are opinionated about the x86_64 baseline they want
to target.

While RHEL-9 switched to a x86_64-v2 baseline, Fedora has repeatedly
rejected the idea of moving to an x86_64-v2 baseline, wanting to retain
full backwards compat. So this assumption in QEMU is preventing the
distros from satisfying their chosen build target goals.

> Alexander Monakov (5):
>   Revert "host/i386: assume presence of POPCNT"
>   Revert "host/i386: assume presence of SSSE3"
>   Revert "host/i386: assume presence of SSE2"
>   Revert "host/i386: assume presence of CMOV"
>   Revert "meson: assume x86-64-v2 baseline ISA"
> 
>  host/include/i386/host/cpuinfo.h |  3 +++
>  meson.build  | 10 +++---
>  tcg/i386/tcg-target.c.inc| 15 ++-
>  tcg/i386/tcg-target.h|  5 +++--
>  util/bufferiszero.c  |  4 ++--
>  util/cpuinfo-i386.c  |  7 +--
>  6 files changed, 30 insertions(+), 14 deletions(-)
> 
> -- 
> 2.32.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|