https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124288
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Works with -fexcess-precision=standard or -mfpmath=sse.
4294967295 != 0 (4294967296.000000)
Aborted (core dumped)
the only loop we vectorize is
__attribute__ ((noinline, noclone)) void
f2ui (void)
{
int i;
for (i = 0; i < 1024; i++)
ui[i] = f[i];
}
which uses
.L2:
vcvttps2udq f(%eax), %xmm0
addl $16, %eax
vmovdqa %xmm0, ui-16(%eax)
cmpl $4096, %eax
jne .L2
with -v4 and
f2ui:
.LFB22:
.cfi_startproc
vbroadcastss .LC1, %xmm2
xorl %eax, %eax
.p2align 6
.p2align 4
.p2align 3
.L2:
vmovaps f(%eax), %xmm0
addl $16, %eax
vcmpleps %xmm0, %xmm2, %xmm1
vandps %xmm2, %xmm1, %xmm3
vpslld $31, %xmm1, %xmm1
vsubps %xmm3, %xmm0, %xmm0
vcvttps2dq %xmm0, %xmm0
vpxor %xmm1, %xmm0, %xmm0
vmovdqa %xmm0, ui-16(%eax)
cmpl $4096, %eax
jne .L2
with -v3 (which works without -mfpmath=sse).
Instrumenting with
f2ui ();
for (i = 0; i < 1024; i++)
{
fprintf (stderr, "%i %u != %u (%f)\n", i, ui[i], (unsigned int)f[i],
f[i]);
if (ui[i] != (__typeof (ui[0]))f[i])
abort ();
}
With -v4:
892 4294967040 != 4294967040 (4294967040.000000)
893 4294967040 != 4294967040 (4294967040.000000)
894 4294967040 != 4294967040 (4294967040.000000)
895 4294967295 != 0 (4294967296.000000)
Aborted (core dumped)
with -v3:
892 4294967040 != 4294967040 (4294967040.000000)
893 4294967040 != 4294967040 (4294967040.000000)
894 4294967040 != 4294967040 (4294967040.000000)
895 0 != 0 (4294967296.000000)
896 0 != 0 (4294967296.000000)
...
it seems the code computes fltmax in odd ways and we run into saturation,
possibly getting undefined float to unsigned int converts (overflow)?
Possibly the test should use __FLT_MAX__ and friends instead of that
weird computation.
Jakub, you wrote the testcase.