https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118949
Vineet Gupta <vineetg at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
--- Comment #6 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
(In reply to Vineet Gupta from comment #4)
> Also slightly better test so avoid cpp/installed headers and use bare cc1
>
> void func(const float *a, const float *b, float *c)
> {
> for (long i = 0; i < 1024; ++i) {
> float a_l = __builtin_lround(a[i]);
> float b_l = __builtin_lround(b[i]);
> c[i] = a_l + b_l;
> }
> }
Interestingly the codegen for this test changes with a baremin cc1 vs. one from
a glibc toolchain build.
In the toolchain version, cc1 is able to transform
float output = (float) long __builtin_lround ((double) float input )
to
float output = (float) long __builtin_lroundf (float input )
Due to
TARGET_LIBC_HAS_FUNCTION linux_libc_has_function
So with cc1 what we see is:
W/ gdc0dea98c96e02c | Revert gdc0dea98c96e02c
|
fsrmi 4 | fsrmi 4
vfwcvt.f.f.v v2,v4 | vfwcvt.f.f.v v2,v4
vfwcvt.f.f.v v4,v1 | vfwcvt.f.f.v v4,v1
|
vfcvt.x.f.v v2,v2 | vfcvt.x.f.v v2,v2
| vfcvt.x.f.v v4,v4
fsrm a5 |
vfncvt.f.x.w v2,v2 | fsrm a4
| vfncvt.f.x.w v2,v2
fsrmi 4 | vfncvt.f.x.w v4,v4
vfcvt.x.f.v v4,v4 |
|
fsrm a5 |
vfncvt.f.x.w v4,v4 |
Please note that for this test, I don't see any tree dump delta.
The first diff is in expand and that too is the insn from floatrvvm2dirvvm1sf2
getting emitted earlier vs. it getting emitted later - leading to the the
alternation of VFCVT and VFNCVT, vs. them being lumped together.