[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2024-07-15 Thread Mayshao-oc at zhaoxin dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #38 from Mayshao-oc at zhaoxin dot com ---
vmovdqu is also atomic in Zhaoxin processors if it meets three requirements:
1. the address of its memory operand must be 16-byte aligned
2. vmovdqu is vex.128 not vex.256
3. the memory type of the address is WB
(In reply to Richard Henderson from comment #36)
> (In reply to Mayshao-oc from comment #34)
> > (In reply to Jakub Jelinek from comment #17)
> > > Fixed for AMD on the library side too.
> > > We need a statement from Zhaoxin and VIA for their CPUs.
> > 
> > Sorry for the late reply.
> > We guarantee that VMOVDQA will be an atomic load or store provided 128 bit
> > aligned address in Zhaoxin processors, provided that the memory type is WB.
> > Can we extend this patch to Zhaoxin processors as well?
> 
> Is VMOVDQU atomic, provided the address is aligned in Zhaoxin processors?
> 
> In QEMU, we make use of this additional guarantee from AMD.
> We also reference this gcc bugzilla entry for documentation.  :-)

(In reply to Richard Henderson from comment #36)
> (In reply to Mayshao-oc from comment #34)
> > (In reply to Jakub Jelinek from comment #17)
> > > Fixed for AMD on the library side too.
> > > We need a statement from Zhaoxin and VIA for their CPUs.
> > 
> > Sorry for the late reply.
> > We guarantee that VMOVDQA will be an atomic load or store provided 128 bit
> > aligned address in Zhaoxin processors, provided that the memory type is WB.
> > Can we extend this patch to Zhaoxin processors as well?
> 
> Is VMOVDQU atomic, provided the address is aligned in Zhaoxin processors?
> 
> In QEMU, we make use of this additional guarantee from AMD.
> We also reference this gcc bugzilla entry for documentation.  :-)

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2024-07-15 Thread Mayshao-oc at zhaoxin dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #37 from Mayshao-oc at zhaoxin dot com ---
vmovdqu is also atomic in Zhaoxin processors if it meets three requirements:
1. the address of its memory operand must be 16-byte aligned
2. vmovdqu is vex.128 not vex.256
3. the memory type of the address is WB

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2024-07-10 Thread rth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #36 from Richard Henderson  ---
(In reply to Mayshao-oc from comment #34)
> (In reply to Jakub Jelinek from comment #17)
> > Fixed for AMD on the library side too.
> > We need a statement from Zhaoxin and VIA for their CPUs.
> 
> Sorry for the late reply.
> We guarantee that VMOVDQA will be an atomic load or store provided 128 bit
> aligned address in Zhaoxin processors, provided that the memory type is WB.
> Can we extend this patch to Zhaoxin processors as well?

Is VMOVDQU atomic, provided the address is aligned in Zhaoxin processors?

In QEMU, we make use of this additional guarantee from AMD.
We also reference this gcc bugzilla entry for documentation.  :-)

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2024-07-10 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #35 from Uroš Bizjak  ---
(In reply to Mayshao-oc from comment #34)
> Can we extend this patch to Zhaoxin processors as well?

Just post the enablement patch to gcc-patches@ mailing list.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2024-07-09 Thread Mayshao-oc at zhaoxin dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #34 from Mayshao-oc at zhaoxin dot com ---
(In reply to Jakub Jelinek from comment #17)
> Fixed for AMD on the library side too.
> We need a statement from Zhaoxin and VIA for their CPUs.

Sorry for the late reply.
We guarantee that VMOVDQA will be an atomic load or store provided 128 bit
aligned address in Zhaoxin processors, provided that the memory type is WB.
Can we extend this patch to Zhaoxin processors as well?

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2023-02-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #33 from Segher Boessenkool  ---
Yes, exactly.  It was the X server I think?  I try to forget such horrors :-)

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2023-02-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #32 from Andrew Pinski  ---
(In reply to Segher Boessenkool from comment #31)
> Yes, there was a user who incorrectly used memcpy on non-memory memory.
>From what I remember (it was also reported about aarch64 at one point too), one
of the graphics libraries would call memcpy from normal memory to GPU Memory
(over PCIe) and memcpy will sometimes use unaligned accesses which causes a
fault to the GPU memory.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2023-02-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #31 from Segher Boessenkool  ---
Yes, there was a user who incorrectly used memcpy on non-memory memory.

This is not valid, and never has been.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2023-02-15 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #30 from Florian Weimer  ---
(In reply to Segher Boessenkool from comment #29)
> (In reply to Florian Weimer from comment #28)
> > Maybe this belongs in the ABI manual? For example, the POWER ABI says that
> > memcpy needs to work on device memory.
> 
> Huh?!
> 
> Where do you see this?  The way you state it it is trivially impossible to
> implement, so if we really say that it needs fixing asap.

I thought I had an explicit documented reference somewhere, but for now, all we
have is an undocumented requirement (so not a good example in the context of
this bug at all):

[PATCH] powerpc: Use aligned stores in memset


(There's also a CPU quirk in this area, but I think this wasn't about that.)

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2023-02-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #29 from Segher Boessenkool  ---
(In reply to Florian Weimer from comment #28)
> Maybe this belongs in the ABI manual? For example, the POWER ABI says that
> memcpy needs to work on device memory.

Huh?!

Where do you see this?  The way you state it it is trivially impossible to
implement, so if we really say that it needs fixing asap.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-29 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #28 from Florian Weimer  ---
(In reply to Peter Cordes from comment #27)
> (In reply to Alexander Monakov from comment #26)
> > Sure, the right course of action seems to be to simply document that atomic
> > types and built-ins are meant to be used on "common" (writeback) memory
> 
> Agreed.  Where in the manual should this go?  Maybe a new subsection of the
> chapter about __atomic builtins where we document per-ISA requirements for
> them to actually work?

Maybe this belongs in the ABI manual? For example, the POWER ABI says that
memcpy needs to work on device memory. Documenting the required memory types
for automics seems along the same lines.

The rules are also potentially different for different targets sharing the same
processor architecture.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #27 from Peter Cordes  ---
(In reply to Alexander Monakov from comment #26)
> Sure, the right course of action seems to be to simply document that atomic
> types and built-ins are meant to be used on "common" (writeback) memory

Agreed.  Where in the manual should this go?  Maybe a new subsection of the
chapter about __atomic builtins where we document per-ISA requirements for them
to actually work?

e.g. x86 memory-type stuff, and that ARM assumes all cores are in the same
inner-shareable cache-coherency domain, thus barriers are   dmb ish   not  dmb
sy and so on.
I guess we might want to avoid documenting the actual asm implementation
strategies in the main manual, because that would imply it's supported to make
assumptions based on that.

Putting it near the __atomic docs might make it easier for readers to notice
that the list of requirements exists, vs. scattering them into different pages
for different ISAs.  And we don't currently have any section in the manual
about per-ISA quirks or requirements, just about command-line options,
builtins, and attributes that are per-ISA, so there's no existing page where
this could get tacked on.

This would also be a place where we can document that __atomic ops are
address-free when they're lock-free, and thus usable on shared memory between
processes.  ISO C++ says that *should* be the case for std::atomic, but
doesn't standardize the existence of multiple processes.

To avoid undue worry, documentation about this should probably start by saying
that normal programs (running under mainstream OSes) don't have to worry about
it or do anything special.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #26 from Alexander Monakov  ---
Sure, the right course of action seems to be to simply document that atomic
types and built-ins are meant to be used on "common" (writeback) memory, and no
guarantees can be given otherwise, because it would involve platform specifics
(relaxed ordering of WC writes as you say; tearing by PCI bridges and device
interfaces seems like another possible caveat).

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #25 from Peter Cordes  ---
(In reply to Alexander Monakov from comment #24)
> 
> I think it's possible to get UC/WC mappings via a graphics/compute API (e.g.
> OpenGL, Vulkan, OpenCL, CUDA) on any OS if you get a mapping to device
> memory (and then CPU vendor cannot guarantee that 128b access won't tear
> because it might depend on downstream devices).


Even atomic_int doesn't work properly if you deref a pointer to WC memory.  WC
doesn't have the same ordering guarantees, so it would break acquire/release
semantics.
So we already don't support WC for this.

We do at least de-facto support atomics on UC memory because the ordering
guarantees are a superset of cacheable memory, and 8-byte atomicity for aligned
load/store is guaranteed even for non-cacheable memory types since P5 Pentium
(and on AMD).  (And lock cmpxchg16b is always atomic even on UC memory.)

But you're right that only Intel guarantees that 16-byte VMOVDQA loads/stores
would be atomic on UC memory.  So this change could break that very unwise
corner-case on AMD which only guarantees that for cacheable loads/stores, and
Zhaoxin only for WB.

But was anyone previously using 16-byte atomics on UC device memory?  Do we
actually care about supporting that?  I'd guess no and no, so it's just a
matter of documenting that somewhere.

Since GCC7 we've reported 16-byte atomics as being non-lock-free, so I *hope*
people weren't using __atomic_store_n on device memory.  The underlying
implementation was never guaranteed.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #24 from Alexander Monakov  ---
(In reply to Peter Cordes from comment #23)
> But at least on Linux, I don't think there's a way for user-space to even
> ask for a page of WT or WP memory (or UC or WC).  Only WB memory is easily
> available without hacking the kernel.  As far as I know, this is true on
> other existing OSes.

I think it's possible to get UC/WC mappings via a graphics/compute API (e.g.
OpenGL, Vulkan, OpenCL, CUDA) on any OS if you get a mapping to device memory
(and then CPU vendor cannot guarantee that 128b access won't tear because it
might depend on downstream devices).

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-28 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

Peter Cordes  changed:

   What|Removed |Added

 CC||peter at cordes dot ca

--- Comment #23 from Peter Cordes  ---
(In reply to Xi Ruoyao from comment #20)
> "On Zhaoxin CPUs with AVX, the VMOVDQA instruction is atomic if the accessed
> memory is Write Back, but it's not guaranteed for other memory types."

VMOVDQA is still fine, I think WB is the only memory type that's relevant for
atomics, at least on the mainstream OSes we compile for.  It's not normally
possible for user-space to allocate memory of other types.  Kernels normally
use WB memory for their shared data, too.

You're correct that WT and WP are the other two cacheable memory types, and
Zhaoxin's statement doesn't explicitly guarantee atomicity for those, unlike
Intel and AMD.

But at least on Linux, I don't think there's a way for user-space to even ask
for a page of WT or WP memory (or UC or WC).  Only WB memory is easily
available without hacking the kernel.  As far as I know, this is true on other
existing OSes.

WT = write-through: read caching, no write-allocate.  Write hits update the
line and memory.
WP = write-protect: read caching, no write-allocate.  Writes go around the
cache, evicting even on hit.
(https://stackoverflow.com/questions/65953033/whats-the-usecase-of-write-protected-pat-memory-type
quotes the Intel definitions.)

Until recently, the main work on formalizing the x86 TSO memory model had only
looked at WB memory.
A 2022 paper looked at WT, UC, and WC memory types:
https://dl.acm.org/doi/pdf/10.1145/3498683 - Extending Intel-x86 Consistency
and Persistency
Formalising the Semantics of Intel-x86 Memory Types and Non-temporal Stores
(The intro part describing memory types is quite readable, in plain English not
full of formal symbols.  They only mention WP once, but tested some litmus
tests with readers and writers using any combination of the other memory
types.)


Some commenters on my answer on when WT is ever used or useful confirmed that
mainstream OSes don't give easy access to it.
https://stackoverflow.com/questions/61129142/when-use-write-through-cache-policy-for-pages/61130838#61130838
* Linux has never merged a patch to let user-space allocate WT pages.
* The Windows kernel reportedly doesn't have a mechanism to keep track of pages
that should be WT or WP, so you won't find any.

I don't know about *BSD making it plausible for user-space to point an _Atomic
int * at a page of WT or WP memory.  I'd guess not.

I don't know if there's anywhere we can document that _Atomic objects need to
be in memory that's allocated in a "normal" way.  Probably hard to word without
accidentally disallowing something that's fine.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-23 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #22 from Xi Ruoyao  ---
(In reply to Jakub Jelinek from comment #21)
> What about loads?  That is even more important than the stores.  While
> atomic store can be worst case done through cmpxchg16b, even when it is
> slower, we can't use cmpxchg16b on atomic load because we don't know if the
> memory isn't read-only.

Loads are also atomic for WB.

> As for the Write Back only vs. other types, doesn't that match the
> " for cacheable" in the AMD statement?

If I read the manual correctly, Write Back, Write Through, and Write Protected
are all "cacheable".  Mayshao told me VMOVDQA is atomic for WB, but not atomic
for UC and WC (they are not cacheable so I think we don't need to take care). 
So how about WT and WP?

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-23 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #21 from Jakub Jelinek  ---
What about loads?  That is even more important than the stores.  While atomic
store can be worst case done through cmpxchg16b, even when it is slower, we
can't use cmpxchg16b on atomic load because we don't know if the memory isn't
read-only.
As for the Write Back only vs. other types, doesn't that match the
" for cacheable" in the AMD statement?

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-23 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #20 from Xi Ruoyao  ---
>From Mayshao (Zhaoxin engineer):

"On Zhaoxin CPUs with AVX, the VMOVDQA instruction is atomic if the accessed
memory is Write Back, but it's not guaranteed for other memory types."

Is it allowed to use VMOVDQA then?

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-21 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #19 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:60880f3afc82f55b834643e449883dd5b6ad057a

commit r11-10385-g60880f3afc82f55b834643e449883dd5b6ad057a
Author: Jakub Jelinek 
Date:   Tue Nov 15 08:14:45 2022 +0100

libatomic: Handle AVX+CX16 AMD like Intel for 16b atomics [PR104688]

We got a response from AMD in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688#c10
so the following patch starts treating AMD with AVX and CMPXCHG16B
ISAs like Intel by using vmovdqa for atomic load/store in libatomic.
We still don't have confirmation from Zhaoxin and VIA (anything else
with CPUs featuring AVX and CX16?).

2022-11-15  Jakub Jelinek  

PR target/104688
* config/x86/init.c (__libat_feat1_init): Don't clear
bit_AVX on AMD CPUs.

(cherry picked from commit 4a7a846687e076eae58ad3ea959245b2bf7fdc07)

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #18 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:86dea99d8525bf49d51636332d6be440e51b931a

commit r12-8920-g86dea99d8525bf49d51636332d6be440e51b931a
Author: Jakub Jelinek 
Date:   Tue Nov 15 08:14:45 2022 +0100

libatomic: Handle AVX+CX16 AMD like Intel for 16b atomics [PR104688]

We got a response from AMD in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688#c10
so the following patch starts treating AMD with AVX and CMPXCHG16B
ISAs like Intel by using vmovdqa for atomic load/store in libatomic.
We still don't have confirmation from Zhaoxin and VIA (anything else
with CPUs featuring AVX and CX16?).

2022-11-15  Jakub Jelinek  

PR target/104688
* config/x86/init.c (__libat_feat1_init): Don't clear
bit_AVX on AMD CPUs.

(cherry picked from commit 4a7a846687e076eae58ad3ea959245b2bf7fdc07)

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #17 from Jakub Jelinek  ---
Fixed for AMD on the library side too.
We need a statement from Zhaoxin and VIA for their CPUs.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #16 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:4a7a846687e076eae58ad3ea959245b2bf7fdc07

commit r13-4048-g4a7a846687e076eae58ad3ea959245b2bf7fdc07
Author: Jakub Jelinek 
Date:   Tue Nov 15 08:14:45 2022 +0100

libatomic: Handle AVX+CX16 AMD like Intel for 16b atomics [PR104688]

We got a response from AMD in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688#c10
so the following patch starts treating AMD with AVX and CMPXCHG16B
ISAs like Intel by using vmovdqa for atomic load/store in libatomic.
We still don't have confirmation from Zhaoxin and VIA (anything else
with CPUs featuring AVX and CX16?).

2022-11-15  Jakub Jelinek  

PR target/104688
* config/x86/init.c (__libat_feat1_init): Don't clear
bit_AVX on AMD CPUs.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #15 from Alexander Monakov  ---
Ah, there will be an mfence after the vmovdqa when necessary for an atomic
store, thanks (I missed that because the testcase doesn't scan for mfence).

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #14 from Jakub Jelinek  ---
For ordering guarantees I assume (already since the r12-7689 change) that
VMOVDQA behaves the same as MOVL/MOVQ.
This PR was about whether there is a quarantee that VMOVDQA will be an atomic
load or store provided 128-bit aligned address.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #13 from Alexander Monakov  ---
Jakub, sorry if I misunderstood the patches from a brief glance, but what
ordering guarantees are you assuming for AVX accesses? It should not be
SEQ_CST. I think what Intel manual is saying is that said accessing will not
tear, but reordering is the same as pre-existing x86 TSO rules (a load can
finish before an earlier store is globally visible).

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

--- Comment #12 from Jakub Jelinek  ---
I've posted the patches (so far only lightly tested):
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606021.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606022.html
It is still Sunday in AoE, so we still have stage1 there.

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-13 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

Xi Ruoyao  changed:

   What|Removed |Added

Summary|gcc and libatomic can use   |gcc and libatomic can use
   |SSE for 128-bit atomic  |SSE for 128-bit atomic
   |loads on Intel CPUs with|loads on Intel and AMD CPUs
   |AVX |with AVX
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2022-11-14