https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118952
Bug ID: 118952
Summary: AArch64 get_fpcr and set_fpcr builtins don't block
reordering of operations past them
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: wrong-code
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
The __builtin_aarch64_set_fpcr and __builtin_aarch64_get_fpcr builtins are not
as useful in practice as we'd like. For the input code:
#include <stdint.h>
#include <string.h>
uint64_t foo (uint32_t *in_fpcr, uint32_t src) {
uint64_t dst;
uint32_t saved_fpcr;
float fsrc;
saved_fpcr = __builtin_aarch64_get_fpcr();
__builtin_aarch64_set_fpcr(*in_fpcr);
memcpy(&fsrc, &src, 4);
double d = (double) fsrc;
memcpy(&dst, &d, 8);
*in_fpcr = __builtin_aarch64_get_fpcr();
__builtin_aarch64_set_fpcr(saved_fpcr);
return dst;
}
at -O2 we get:
foo:
fmov s31, w1
mrs x1, fpcr
ldr w2, [x0]
msr fpcr, x2
mrs x2, fpcr
str w2, [x0]
msr fpcr, x1
fcvt d31, s31
fmov x0, d31
ret
The problem is that the fcvt is moved outside the region that has a modified
FPCR, defeating the purpose of the builtins.
I initially thought this was the RTL insn scheduler moving the operations but
the RTL patterns for the builtins do use unspec_volatile that is supposed to
prevent such movement.
But the problem seems to be at expand-time. The GIMPLE looks correct:
saved_fpcr_6 = __builtin_aarch64_get_fpcr ();
_1 = *in_fpcr_7(D);
__builtin_aarch64_set_fpcr (_1);
_14 = VIEW_CONVERT_EXPR<float>(src_9(D));
_2 = (double) _14;
_10 = VIEW_CONVERT_EXPR<unsigned long>(_2);
_3 = __builtin_aarch64_get_fpcr ();
*in_fpcr_7(D) = _3;
__builtin_aarch64_set_fpcr (saved_fpcr_6);
return _10;
but the RTL generation is:
(insn 2 5 3 2 (set (reg/v/f:DI 107 [ in_fpcr ])
(reg:DI 0 x0 [ in_fpcr ])) "fpcr.c":4:48 -1
(nil))
(insn 3 2 4 2 (set (reg/v:SI 108 [ src ])
(reg:SI 1 x1 [ src ])) "fpcr.c":4:48 -1
(nil))
(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
(insn 7 4 8 2 (set (reg/v:SI 104 [ saved_fpcr ])
(unspec_volatile:SI [
(const_int 0 [0])
] UNSPECV_GET_FPCR)) "fpcr.c":9:18 -1
(nil))
(insn 8 7 9 2 (set (reg:SI 109)
(mem:SI (reg/v/f:DI 107 [ in_fpcr ]) [1 *in_fpcr_7(D)+0 S4 A32]))
"fpcr.c":10:5 -1
(nil))
(insn 9 8 10 2 (unspec_volatile [
(reg:SI 109)
] UNSPECV_SET_FPCR) "fpcr.c":10:5 -1
(nil))
(insn 10 9 11 2 (set (reg:SI 103 [ _3 ])
(unspec_volatile:SI [
(const_int 0 [0])
] UNSPECV_GET_FPCR)) "fpcr.c":16:16 -1
(nil))
(insn 11 10 12 2 (set (mem:SI (reg/v/f:DI 107 [ in_fpcr ]) [1 *in_fpcr_7(D)+0
S4 A32])
(reg:SI 103 [ _3 ])) "fpcr.c":16:14 discrim 1 -1
(nil))
(insn 12 11 13 2 (unspec_volatile [
(reg/v:SI 104 [ saved_fpcr ])
] UNSPECV_SET_FPCR) "fpcr.c":17:5 -1
(nil))
(insn 13 12 14 2 (set (reg:DF 111 [ _2 ])
(float_extend:DF (subreg:SF (reg/v:SI 108 [ src ]) 0))) "fpcr.c":13:16
-1
(nil))
(insn 14 13 18 2 (set (reg:DI 106 [ <retval> ])
(subreg:DI (reg:DF 111 [ _2 ]) 0)) "fpcr.c":18:12 -1
(nil))
insn 13 has been moved past the GET_FPCR and SET_FPCR builtins. Is that
something the out-of-ssa code is doing?