https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122818
Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |WONTFIX
--- Comment #3 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> ---
https://godbolt.org/z/YMb6Y7TMs shows a fairly minimal example of the
fixed-size mask conversions. If the optimizer were able to see through all of
the operations, it would compile to a simple memcpy.
This sequence is the mask no-op:
vmovmskps eax, ymm0 // ymm0 is known to be a mask
movzx eax, al
vmovd xmm0, eax
vpbroadcastd ymm0, xmm0
vpand ymm0, ymm0, YMMWORD PTR .LC0[rip]
vpxor xmm2, xmm2, xmm2
vpcmpgtd ymm0, ymm0, ymm2
.LC0:
.long 1
.long 2
.long 4
.long 8
.long 16
.long 32
.long 64
.long 128
I think it is out of scope to recognize patterns like this (which is why I
never reported them). We need an abstraction on a higher level.