[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #44 from Oleg Endo --- Author: olegendo Date: Sat Dec 13 13:17:55 2014 New Revision: 218707 URL: https://gcc.gnu.org/viewcvs?rev=218707&root=gcc&view=rev Log: gcc/testsuite/ PR target/53513 * gcc.target/sh/attr-isr-nosave_low_regs.c: Fix matching of expected register push/pop sequences. * gcc.target/sh/attr-isr.c: Likewise. * gcc.target/sh/attr-isr-trapa.c: Likewise. * gcc.target/sh/pragma-isr-nosave_low_regs.c: Likewise. * gcc.target/sh/pragma-isr-trapa.c: Likewise. * gcc.target/sh/pragma-isr-trapa2.c: Likewise. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/attr-isr-nosave_low_regs.c trunk/gcc/testsuite/gcc.target/sh/attr-isr-trapa.c trunk/gcc/testsuite/gcc.target/sh/attr-isr.c trunk/gcc/testsuite/gcc.target/sh/pragma-isr-nosave_low_regs.c trunk/gcc/testsuite/gcc.target/sh/pragma-isr-trapa.c trunk/gcc/testsuite/gcc.target/sh/pragma-isr-trapa2.c
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #43 from Oleg Endo --- Author: olegendo Date: Wed Dec 10 08:31:32 2014 New Revision: 218563 URL: https://gcc.gnu.org/viewcvs?rev=218563&root=gcc&view=rev Log: gcc/ PR target/53513 * doc/extend.texi (__builtin_sh_set_fpscr): Fix typo. Modified: trunk/gcc/ChangeLog trunk/gcc/doc/extend.texi
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #42 from Oleg Endo --- Author: olegendo Date: Wed Dec 10 00:21:36 2014 New Revision: 218551 URL: https://gcc.gnu.org/viewcvs?rev=218551&root=gcc&view=rev Log: gcc/ PR target/53513 * doc/extend.texi (__builtin_sh_get_fpscr, __builtin_sh_get_fpscr): Document it. Modified: trunk/gcc/ChangeLog trunk/gcc/doc/extend.texi
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #41 from Oleg Endo --- Author: olegendo Date: Sun Dec 7 23:19:59 2014 New Revision: 218472 URL: https://gcc.gnu.org/viewcvs?rev=218472&root=gcc&view=rev Log: gcc/testsuite/ PR target/53513 * gcc.target/sh/pr54602-4.c: Fix matching of rte-nop sequence. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54602-4.c
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #40 from Oleg Endo --- Author: olegendo Date: Sat Oct 18 10:51:08 2014 New Revision: 216424 URL: https://gcc.gnu.org/viewcvs?rev=216424&root=gcc&view=rev Log: gcc/ PR target/53513 * config/sh/sh-modes.def (PSI): Remove. * config/sh/sh-protos.h (get_fpscr_rtx): Remove. * config/sh/sh.c (fpscr_rtx, get_fpscr_rtx): Remove. (sh_reorg): Remove commented out FPSCR code. (fpscr_set_from_mem): Use SImode instead of PSImode. Emit lds_fpscr insn instead of move insn. (sh_hard_regno_mode_ok): Return SImode for FPSCR. (sh_legitimate_address_p, sh_legitimize_reload_address): Remove PSImode handling. (sh_emit_mode_set): Emit lds_fpscr and sts_fpscr insns. (sh1_builtin_p): Uncomment. (SH_BLTIN_UV 25, SH_BLTIN_VU 26): New macros. (bdesc): Add __builtin_sh_get_fpscr and __builtin_sh_set_fpscr. * config/sh/sh/predicates.md (fpscr_operand): Simplify. (fpscr_movsrc_operand, fpscr_movdst_operand): New predicates. (general_movsrc_operand, general_movdst_operand): Disallow fpscr_operand. * config/sh/sh.md (FPSCR_FR): New constant. (push_fpscr): Emit sts_fpscr insn. (pop_fpscr): Emit lds_fpscr_insn. (movsi_ie): Disallow FPSCR operands. (fpu_switch, unnamed related split, extend_psi_si, truncate_si_psi): Remove insns. (lds_fpscr, sts_fpscr): New insns. (toggle_sz, toggle_pr): Use SImode for FPSCR_REG instead of PSImode. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh-modes.def trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #39 from Kazumoto Kojima --- (In reply to Oleg Endo from comment #38) > ... there are no new failures for -m4 -ml and -m4 -mb. I'm tempted to apply > it. Kaz, do you have any objections? I have no objection.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #38 from Oleg Endo --- (In reply to Oleg Endo from comment #37) > Created attachment 33751 [details] > Use SImode for FPSCR, add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr > > I'm now testing this patch. ... there are no new failures for -m4 -ml and -m4 -mb. I'm tempted to apply it. Kaz, do you have any objections?
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 Oleg Endo changed: What|Removed |Added Attachment #33745|0 |1 is obsolete|| --- Comment #37 from Oleg Endo --- Created attachment 33751 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33751&action=edit Use SImode for FPSCR, add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr I'm now testing this patch. It removes The PSImode for FPSCR completely and splits the fpu_switch insn into two separate lds_fpscr and sts_fpscr insns. This allows relaxing the insn dependencies. Also now that sts_fpscr becomes a single-set insn, it can be stuffed into delay slots. I've also removed the alternatives for FPSCR <-> memory, except the pre-dec and post-inc ones, which are needed for push and pop insns. As a consequence, a user initiated __builtin_sh_get_fpscr () to a memory will always be ferried though a general register. There is some room for improvement, but it goes into the direction of address-mode-selection optimization, which can be done later. When doing a __builtin_sh_set_fpscr (value) the compiler will always insert code to preserve the current FPSCR FR, SZ, PR mode bits. This always involves getting the current FPSCR into a general register first and then loading FPSCR from a general register. Thus we can omit FPSCR loads from memory for now.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #36 from Oleg Endo --- (In reply to Oleg Endo from comment #35) > Created attachment 33745 [details] > Use SImode for FPSCR, add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr > > So I ended up removing the usage of PSImode for FPSCR and using SImode > instead. I've tested this patch on r216173, together with the already > applied attachment 33727 [details] patch, with -m4 -ml and -m4 -mb. There > is one new failure: > > FAIL: gcc.c-torture/execute/20021120-1.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (internal compiler error) > > error: insn does not satisfy its constraints: > (insn 1491 1489 1176 5 (set (reg:SI 12 r12) > (plus:SI (reg:SI 13 r13) > (const_int 24 [0x18]))) 20021120-1.c:39 76 {*addsi3_compact} > (nil)) > > > Summary: > > insn 1173 stores the fpscr to r12 > > the mode switch (xor op) is done on r12 > > insn 1489 writes r12 (new fpscr value) on the stack > > insn 1491 tries to recalculate the stack address and put it into r12, to > satisfy the memory constraints of fpu_switch (which now accepts post-inc > load and simple register address). > > insn 1776 tries to load it from the stack > > > Reload generates a wrong insn addsi3 insn. Something similar was happening > in PR 55212, too. Disallowing memory operands (except pre-dec store, post-inc load for push,pop) in the fpu_switch pattern fixes that failure. Although it might result in less optimal code in some cases where user code wants to store the current fpscr in memory. On the other hand, the generated memory address code in such cases is not very good anyway. Thus it's probably better to drop that for now.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #35 from Oleg Endo --- Created attachment 33745 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33745&action=edit Use SImode for FPSCR, add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr So I ended up removing the usage of PSImode for FPSCR and using SImode instead. I've tested this patch on r216173, together with the already applied attachment 33727 patch, with -m4 -ml and -m4 -mb. There is one new failure: FAIL: gcc.c-torture/execute/20021120-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) error: insn does not satisfy its constraints: (insn 1491 1489 1176 5 (set (reg:SI 12 r12) (plus:SI (reg:SI 13 r13) (const_int 24 [0x18]))) 20021120-1.c:39 76 {*addsi3_compact} (nil)) 20021120-1.c:58:1: internal compiler error: in reload_cse_simplify_operands, at postreload.c:415 0x8565017 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../../gcc-trunk2/gcc/rtl-error.c:110 0x8565055 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../../gcc-trunk2/gcc/rtl-error.c:121 0x850c0cc reload_cse_simplify_operands ../../gcc-trunk2/gcc/postreload.c:415 0x850e5ea reload_cse_simplify ../../gcc-trunk2/gcc/postreload.c:127 0x850e5ea reload_cse_regs_1 ../../gcc-trunk2/gcc/postreload.c:224 0x850e6a2 reload_cse_regs ../../gcc-trunk2/gcc/postreload.c:72 0x850e6a2 execute ../../gcc-trunk2/gcc/postreload.c:2348 At that place, reload it's trying to stuff with the fpu_switch insn: (insn 293 1480 1173 5 (parallel [ (set (mem/c:DF (reg:SI 12 r12) [4 %sfp+-344 S8 A32]) (reg:DF 70 fr6)) (use (reg:SI 154 fpscr0)) (clobber (scratch:SI)) ]) 20021120-1.c:39 289 {movdf_i4} (nil)) (insn 1173 293 1483 5 (parallel [ (set (reg:SI 12 r12) (reg/v:SI 151 )) (use (reg:SI 155 fpscr1)) (use (reg:SI 154 fpscr0)) (set (reg:SI 155 fpscr1) (unspec_volatile:SI [ (const_int 0 [0]) ] UNSPECV_FPSCR_STAT)) (set (reg:SI 154 fpscr0) (unspec_volatile:SI [ (const_int 0 [0]) ] UNSPECV_FPSCR_MODES)) ]) 20021120-1.c:39 434 {fpu_switch} (nil)) (insn 1483 1173 1484 5 (set (reg:SI 13 r13) (const_int 252 [0xfc])) 20021120-1.c:39 258 {movsi_ie} (nil)) (insn 1484 1483 1485 5 (set (reg:SI 13 r13) (plus:SI (reg:SI 13 r13) (reg/f:SI 15 r15))) 20021120-1.c:39 76 {*addsi3_compact} (expr_list:REG_EQUIV (plus:SI (reg/f:SI 15 r15) (const_int 252 [0xfc])) (nil))) (insn 1485 1484 1174 5 (set (mem/c:SI (plus:SI (reg:SI 13 r13) (const_int 24 [0x18])) [4 %sfp+-76 S4 A32]) (reg:SI 12 r12)) 20021120-1.c:39 258 {movsi_ie} (nil)) (note 1174 1485 1490 5 NOTE_INSN_DELETED) (insn 1490 1174 1175 5 (set (reg:SI 13 r13) (const_int 524288 [0x8])) 20021120-1.c:39 258 {movsi_ie} (nil)) (insn 1175 1490 1487 5 (set (reg:SI 12 r12) (xor:SI (reg:SI 12 r12) (reg:SI 13 r13))) 20021120-1.c:39 138 {*xorsi3_compact} (nil)) (insn 1487 1175 1488 5 (set (reg:SI 13 r13) (const_int 252 [0xfc])) 20021120-1.c:39 258 {movsi_ie} (nil)) (insn 1488 1487 1489 5 (set (reg:SI 13 r13) (plus:SI (reg:SI 13 r13) (reg/f:SI 15 r15))) 20021120-1.c:39 76 {*addsi3_compact} (expr_list:REG_EQUIV (plus:SI (reg/f:SI 15 r15) (const_int 252 [0xfc])) (nil))) (insn 1489 1488 1491 5 (set (mem/c:SI (plus:SI (reg:SI 13 r13) (const_int 24 [0x18])) [4 %sfp+-76 S4 A32]) (reg:SI 12 r12)) 20021120-1.c:39 258 {movsi_ie} (nil)) (insn 1491 1489 1176 5 (set (reg:SI 12 r12) (plus:SI (reg:SI 13 r13) (const_int 24 [0x18]))) 20021120-1.c:39 76 {*addsi3_compact} (nil)) (insn 1176 1491 1496 5 (parallel [ (set (reg/v:SI 151 ) (mem/c:SI (reg:SI 12 r12) [4 %sfp+-76 S4 A32])) (use (reg:SI 155 fpscr1)) (use (reg:SI 154 fpscr0)) (set (reg:SI 155 fpscr1) (unspec_volatile:SI [ (const_int 0 [0]) ] UNSPECV_FPSCR_STAT)) (set (reg:SI 154 fpscr0) (unspec_volatile:SI [ (const_int 0 [0]) ] UNSPECV_FPSCR_MODES)) ]) 20021120-1.c:39 434 {fpu_switch} (nil)) (insn 1496 1176 1497 5 (set (reg:SI 12 r12) (const_int 16 [0x10])) 20021120-1.c:39 258 {movsi_ie} (nil)) Summary: insn 1173 stores the fpscr to r12 the mode switch (xor op) is done on r12 insn 1489 writes r12 (new fpscr value) on the stack insn 1491 tries to recalculate the stack address and put it into r12, to satisfy the memory constraints of fpu_switch (which now accepts post-inc load and
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #34 from Oleg Endo --- Test run for -m2e -ml, -m2a -mb, -m2a-single -mb, -m4-single-ml has finished and shows no new failures (except the ISR stuff).
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #33 from Kazumoto Kojima --- (In reply to Oleg Endo from comment #32) > I'd propose the following for SH: > > unsigned int __builtin_sh_get_fpscr () > void __builtin_sh_set_fpscr (unsigned int) > > Any opinions or feedback regarding the matter? Looks fine to me.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #32 from Oleg Endo --- As a next step, I'd like to add built-in functions to get/set the FPSCR on SH. We have the old __get_fpscr and __set_fpscr functions in libgcc which modify the __fpscr_values array and the FPSCR. One idea would be to make __get_fpscr and __set_fpscr built-in functions, so that new code never generates the respective library calls. I don't know how expected/unexpected that would be for users and their existing code. For example, the current glibc does this in sysdeps/sh/sh4/fpu/fpu_control.h: #if defined __GNUC__ __BEGIN_DECLS /* GCC provides this function. */ extern void __set_fpscr (unsigned long); #define _FPU_SETCW(cw) __set_fpscr ((cw)) #else #define _FPU_SETCW(cw) __asm__ ("lds %0,fpscr" : : "r" (cw)) #endif By making __set_fpscr a builtin it would automagically work, but would never update the __fpscr_values. I'm not sure what kind of other fpscr related work arounds build on top of that. Alternatively, we could name the built-in functions differently. I've briefly checked the docs of other targets, this is what I've found: aarch64: unsigned int __builtin_aarch64_get_fpcr () void __builtin_aarch64_set_fpcr (unsigned int) unsigned int __builtin_aarch64_get_fpsr () void __builtin_aarch64_set_fpsr (unsigned int) ARM: unsigned int __builtin_arm_get_fpscr () void __builtin_arm_set_fpscr (unsigned int) MIPS: unsigned int __builtin_mips_get_fcsr (void) void __builtin_mips_set_fcsr (unsigned int value) I'd propose the following for SH: unsigned int __builtin_sh_get_fpscr () void __builtin_sh_set_fpscr (unsigned int) Any opinions or feedback regarding the matter?
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #31 from Oleg Endo --- Author: olegendo Date: Thu Oct 16 10:58:36 2014 New Revision: 216307 URL: https://gcc.gnu.org/viewcvs?rev=216307&root=gcc&view=rev Log: gcc/ PR target/53513 * config/sh/sh-protos.h (emit_sf_insn, emit_df_insn, expand_sf_unop, expand_sf_binop, expand_df_unop, expand_df_binop): Remove. * config/sh/sh.c (sh_emit_set_t_insn): Adjust generated insn pattern to match fp insn patterns. (calc_live_regs): Add FPSCR_MODES_REG and FPSCR_STAT_REG to the ignore list. (emit_sf_insn, emit_df_insn, expand_sf_unop, expand_sf_binop, expand_df_unop, expand_df_binop): Remove. (sh_conditional_register_usage): Mark FPSCR_MODES_REG and FPSCR_STAT_REG as not call clobbered. (sh_emit_mode_set): Emit fpscr store-modify-load sequence instead of invoking fpscr_set_from_mem. * config/sh/sh.h (MAX_REGISTER_NAME_LENGTH): Increase to 6. (SH_REGISTER_NAMES_INITIALIZER): Add names for FPSCR_MODES_REG and FPSCR_STAT_REG. (REGISTER_NAMES): Adjust. (SPECIAL_REGISTER_P): Add FPSCR_MODES_REG and FPSCR_STAT_REG. (FIRST_PSEUDO_REGISTER): Increase to 156. (DWARF_FRAME_REGISTERS): Define as 153 to keep the original value. (FIXED_REGISTERS, CALL_USED_REGISTERS): Add FPSCR_MODES_REG and FPSCR_STAT_REG. (REG_CLASS_CONTENTS): Adjust ALL_REGS bit mask to include FPSCR_MODES_REG and FPSCR_STAT_REG. (REG_ALLOC_ORDER): Add FPSCR_MODES_REG and FPSCR_STAT_REG. * config/sh/sh.md (FPSCR_MODES_REG, FPSCR_STAT_REG, FPSCR_PR, FPSCR_SZ): Add new constants. (UNSPECV_FPSCR_MODES, UNSPECV_FPSCR_STAT): Add new unspecv constants. (movpsi): Use TARGET_FPU_ANY condition, invoke gen_fpu_switch. (fpu_switch): Add use and set of FPSCR_STAT_REG and FPSCR_MODES_REG. Use TARGET_FPU_ANY condition. (fpu_switch peephole2): Remove. (fpu_switch split): Use simple_mem_operand to capture the mem and adjust split implementation. (extend_psi_si, truncate_si_psi): New insns. (toggle_sz, toggle_pr): Use FPSCR_SZ, FPSCR_PR constants. Add set of FPSCR_MODES_REG. (push_e, push_4, pop_e, pop_4, movdf_i4, reload_indf__frn, movsf_ie, reload_insf__frn, force_mode_for_call, calli, calli_tbr_rel, calli_pcrel, call_pcrel, call_compact, call_compact_rettramp, call_valuei, call_valuei_tbr_rel, call_valuei_pcrel, call_value_pcrel, call_value_compact, call_value_compact_rettramp, call, call_pop_compact, call_pop_compact_rettramp, call_value, sibcalli, sibcalli_pcrel, sibcalli_thunk, sibcall_pcrel, sibcall_compact, sibcall, sibcall_valuei, sibcall_valuei_pcrel, sibcall_value_pcrel, sibcall_value_compact, sibcall_value, call_value_pop_compact, call_value_pop_compact_rettramp, various unnamed splits): Replace use of FPSCR_REG with use of FPSCR_MODES_REG. Adjust gen_* function uses. (floatsisf2_i4, *floatsisf2_ie): Merge into floatsisf2_i4. (fix_truncsfsi2_i4, *fixsfsi): Merge into fix_truncsfsi2_i4. (cmpgtsf_t, cmpgtsf_t_i4): Merge into cmpgtsf_t. (cmpeqsf_t, cmpeqsf_t_i4): Merge into cmpeqsf_t. (ieee_ccmpeqsf_t, *ieee_ccmpeqsf_t_4): Merge into ieee_ccmpeqsf_t. (udivsi3_i4, divsi3_i4, addsf3_i, subsf3_i, mulsf3_i, fmasf4_i, *fmasf4, divsf3_i, floatsisf2_i4, fix_truncsfsi2_i4, cmpgtsf_t, cmpeqsf_t, ieee_ccmpeqsf_t, sqrtsf2_i, rsqrtsf2, fsca, adddf3_i, subdf3_i, muldf3_i, divdf3_i, floatsidf2_i, fix_truncdfsi2_i, cmpgtdf_t, cmpeqdf_t, *ieee_ccmpeqdf_t, sqrtdf2_i, extendsfdf2_i4, truncdfsf2_i4): Replace use of FPSCR_REG with clobber of FPSCR_STAT_REG and use of FPSCR_MODES_REG. Adjust gen_* function uses. gcc/testsuite/ PR target/53513 * gcc.target/sh/pr54680.c: Adjust matching of lds insn. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.h trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr54680.c
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #30 from Kazumoto Kojima --- (In reply to Oleg Endo from comment #29) > With this patch there are no new failures for -m4 -ml and -m4 -mb here, > except the ISR failures. I'll also test it for the other combinations > later. I'd like to get the current sh4 working version first. Then fix > other niche problems with ISRs or SH2E. Kaz, what do you think? Sounds reasonable. Please go ahead.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #29 from Oleg Endo --- (In reply to Oleg Endo from comment #28) > Created attachment 33727 [details] > Using virtual FPSCR registers to model insn dependencies > > The problem is the define_split and the peephole2 patterns below the > "fpu_switch" insn. I don't know how/if that was working before. I've > removed the peephole2 pattern and rewrote the split pattern, which fixes the > failure above. I'll re-test the whole thing again. With this patch there are no new failures for -m4 -ml and -m4 -mb here, except the ISR failures. I'll also test it for the other combinations later. I'd like to get the current sh4 working version first. Then fix other niche problems with ISRs or SH2E. Kaz, what do you think?
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 Oleg Endo changed: What|Removed |Added Attachment #33721|0 |1 is obsolete|| --- Comment #28 from Oleg Endo --- Created attachment 33727 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33727&action=edit Using virtual FPSCR registers to model insn dependencies (In reply to Oleg Endo from comment #27) > Created attachment 33721 [details] > Using virtual FPSCR registers to model insn dependencies > > Updated patch that avoids the single_set problems by using (clobber (reg:SI > FPSCR_STAT_REG)) instead of a set. This also eliminates the fsca pattern > changes in the previous patch. Since the 'fpu_switch' insn is still a > multiple set insn, it won't be used for delay slot stuffing, but this is a > minor issue that can be addressed later. I'm testing the patch now on > sh-sim. At least 'make all' works. Testing for '-m4 -ml' and '-m4 -mb' shows one new failure (ignoring the ISR failures): FAIL: gcc.c-torture/execute/pr28982a.c -O1 (internal compiler error) FAIL: gcc.c-torture/execute/pr28982a.c -O1 (test for excess errors) The problem is the define_split and the peephole2 patterns below the "fpu_switch" insn. I don't know how/if that was working before. I've removed the peephole2 pattern and rewrote the split pattern, which fixes the failure above. I'll re-test the whole thing again.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 Oleg Endo changed: What|Removed |Added Attachment #33717|0 |1 is obsolete|| --- Comment #27 from Oleg Endo --- Created attachment 33721 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33721&action=edit Using virtual FPSCR registers to model insn dependencies Updated patch that avoids the single_set problems by using (clobber (reg:SI FPSCR_STAT_REG)) instead of a set. This also eliminates the fsca pattern changes in the previous patch. Since the 'fpu_switch' insn is still a multiple set insn, it won't be used for delay slot stuffing, but this is a minor issue that can be addressed later. I'm testing the patch now on sh-sim. At least 'make all' works.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #26 from Oleg Endo --- (In reply to Kazumoto Kojima from comment #25) > (In reply to Oleg Endo from comment #23) > > Kaz, could you please have an early look at it? > > The idea looks OK to me. Build fails on sh4-linux with the patch, though. > Maybe a wrong version? > There is a typo divdf3 expand. s/gen_divdf3 /gen_divdf3_i / > > - expand_df_binop (&gen_divdf3_i, operands); > + emit_insn (gen_divdf3 (operands[0], operands[1], operands[2])); > > With fixing it, yet segfaults during compiling libgcc2.c __powidf2 at No, that was the correct version unfortunately. And that was my infinite-memory problem. Thanks for spotting it. > > #0 0x089706b4 in sh_adjust_cost (insn=0xb7f767e0, link=0xb7f79380, > dep_insn=0xb7f762d0, cost=25) at trunk/gcc/config/sh/sh.c:10908 > 10908 SET_SRC (use_pat))) > (gdb) l > 10903cycle earlier. */ > 10904 else if (reload_completed > 10905 && get_attr_dfp_comp (dep_insn) == DFP_COMP_YES > 10906 && (use_pat = single_set (insn)) > 10907 && ! regno_use_in (REGNO (SET_DEST (single_set > (dep_insn))), > 10908 SET_SRC (use_pat))) > > It looks > > - (use (match_operand:PSI 3 "fpscr_operand" "c"))] > + (set (reg:SI FPSCR_STAT_REG) (const_int 0)) > + (use (reg:SI FPSCR_MODES_REG))] > > breaks the assumption of sh_adjust_cost which divdf3_i has only one > set insn. Yep, single_set now returns null for basically all fp insns and it's not checked there. The default implementation of single_set is also the reason why delay slot stuffing is not working on fp insns anymore.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #25 from Kazumoto Kojima --- (In reply to Oleg Endo from comment #23) > Kaz, could you please have an early look at it? The idea looks OK to me. Build fails on sh4-linux with the patch, though. Maybe a wrong version? There is a typo divdf3 expand. s/gen_divdf3 /gen_divdf3_i / - expand_df_binop (&gen_divdf3_i, operands); + emit_insn (gen_divdf3 (operands[0], operands[1], operands[2])); With fixing it, yet segfaults during compiling libgcc2.c __powidf2 at #0 0x089706b4 in sh_adjust_cost (insn=0xb7f767e0, link=0xb7f79380, dep_insn=0xb7f762d0, cost=25) at trunk/gcc/config/sh/sh.c:10908 10908 SET_SRC (use_pat))) (gdb) l 10903cycle earlier. */ 10904 else if (reload_completed 10905 && get_attr_dfp_comp (dep_insn) == DFP_COMP_YES 10906 && (use_pat = single_set (insn)) 10907 && ! regno_use_in (REGNO (SET_DEST (single_set (dep_insn))), 10908 SET_SRC (use_pat))) It looks - (use (match_operand:PSI 3 "fpscr_operand" "c"))] + (set (reg:SI FPSCR_STAT_REG) (const_int 0)) + (use (reg:SI FPSCR_MODES_REG))] breaks the assumption of sh_adjust_cost which divdf3_i has only one set insn.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 Oleg Endo changed: What|Removed |Added Attachment #33716|0 |1 is obsolete|| --- Comment #24 from Oleg Endo --- Created attachment 33717 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33717&action=edit Using virtual FPSCR registers to model insn dependencies sh_emit_set_t_insn was producing nested parallel patterns. When compiling libgcc (for -m4 -mb) memory consumption goes up a lot. Either there is a bug in the patch, or it's triggering a bug somewhere else which seems to cause an infinite alloc cycle somewhere.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 Oleg Endo changed: What|Removed |Added Attachment #33691|0 |1 is obsolete|| --- Comment #23 from Oleg Endo --- Created attachment 33716 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33716&action=edit Using virtual FPSCR registers to model insn dependencies This is a somewhat larger patch which does a couple of things to address the deficits of the previous approach. 1) The hard reg set is extended by two more registers: - FPSCR_MODES_REG represents any mode bits in FPSCR, which are used by fp insns. - FPSCR_STAT_REG represents the status bits in FPSCR, which are written by fp insns. All fp insns that previously had a (use (match_operand:PSI 2 "fpscr_operand" "") now get a (use (reg:SI FPSCR_MODES_REG)) 2) The 'fpu_switch' insn, which is just a FPSCR load/store, now uses and sets the virtual regs FPSCR_MODES_REG and FPSCR_STAT_REG. This creates a sort of reordering barrier due to the reg uses/sets. 3) Two new dummy insns extend_psi_si and truncate_si_psi are added to convert PSIMode <-> SImode, because we can do only logic on SImode, but FPSCR is described as PSImode. This hack could be removed later maybe. 4) The insns toggle_sz and toggle_pr are now also setting the virtual reg FPSCR_MODES_REG. 5) The fsca insn needed some adjustments for the operand matching, as combine started going different paths. Some of the fsca tests in gcc.target/sh were failing and are fixed by this. 6) sh_emit_set_t_insn is adjusted to emit the correct comparison insns with use/set of virtual FPSCR regs. 7) calc_live_regs is adjusted to ignore virtual regs FPSCR_MODES_REG and FPSCR_STAT_REG, or else it will try to generate push/pop insns for those regs. 8) sh_emit_mode_set now emits the FPSCR store-modify-load insn sequence instead of loading FPSCR from __fpscr_values. 9) Some redundant fp insns and related functions are deleted. I've tried to keep the patch as short as possible and not do too many unrelated changes. I haven't fully tested the patch, so there are probably a couple of fallouts. There are two regressions with this patch. The FPSCR loads are not put into delay slots anymore. Probably because the 'fpu_switch' pattern now has too many sets and the DBR rejects that as a slot candidate. The other thing are the failures in gcc.target/sh w.r.t. interrupt functions, which can be addressed later. Kaz, could you please have an early look at it?
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #22 from Oleg Endo --- (In reply to Oleg Endo from comment #21) > (In reply to Oleg Endo from comment #17) > > > > In the 'addsf3_i' pattern, I've tried replacing the > > > > (use (match_operand:PSI 3 "fpscr_operand" "c")) > > > > with > > > > (set (match_operand:PSI 3 "fpscr_operand" "=&c") > > (unspec:PSI [(match_dup 3)] UNSPEC_FPSCR_SET))] > > > > > > I haven't checked all the other side effects it could have, but at least the > > FMA combine patterns still seem work after that change. > > With those changes above applied to all the other FP insns, the FMA tests in > gcc.target/sh fail. As well as some of the FSCA tests. E.g. 'return sinf (x) + cosf (x)' would expand into two sincos patterns and normally combine would fold them into one. However, it doesn't understand the unspec:PSI stuff and tries to nest it such as: Failed to match this instruction: (parallel [ (set (reg:SI 174) (fix:SI (reg:SF 167))) (set (reg/v:PSI 151 ) (unspec:PSI [ (unspec:PSI [ (reg/v:PSI 151 ) ] UNSPEC_FPSCR) ] UNSPEC_FPSCR)) ]) I've tried adding a predicate that recursively matches those nested unspec:PSI things, which makes combine happy, but then reload would bail out for some reason. Moreover, fp insns that do this (set (match_operand:PSI 3 "fpscr_operand" "=&c") (unspec:PSI [(match_dup 3)] UNSPEC_FPSCR_SET))] won't get eliminated even if their results are unused/dead. I'm trying to come up with some alternative approach.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #21 from Oleg Endo --- (In reply to Oleg Endo from comment #17) > > In the 'addsf3_i' pattern, I've tried replacing the > > (use (match_operand:PSI 3 "fpscr_operand" "c")) > > with > > (set (match_operand:PSI 3 "fpscr_operand" "=&c") > (unspec:PSI [(match_dup 3)] UNSPEC_FPSCR_SET))] > > > I haven't checked all the other side effects it could have, but at least the > FMA combine patterns still seem work after that change. With those changes above applied to all the other FP insns, the FMA tests in gcc.target/sh fail. One problem is that at -O1, FMA patterns won't be expanded initially, but are rather formed during combine. With the more complex FPSCA set/use combine won't match the patterns at -O1 but only at -O2. This is regression is probably acceptable. The other issue is that the fix for PR 56547 doesn't work anymore, because for some reason combine can't figure out what to do with the 1.0 constant. Probably because now it'd need to deal with multiple-set insns, which it doesn't do well. Adding a combine bridge pattern doesn't make it go further, as it won't try to combine 'fpreg = fpreg + 1.0' and 'fpreg = fpreg * fpreg'. In this case it's probably better to do a manual fma combine in the split1 pass. This is more effort, but would also fix the first problem.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #20 from Oleg Endo --- (In reply to Oleg Endo from comment #19) > > No I didn't. That was a patch for PR 63260. Sorry for the noise. Now I have. For both '-m4 -ml' and '-m4 -mb' there are a few new failures: FAIL: gcc.dg/attr-isr.c scan-assembler-times [^f]r[0-9][ \t]*, 8 (SH specific test 'gcc.dg/attr-isr.c' should be moved to 'gcc.target/sh') FAIL: gcc.target/sh/attr-isr-nosave_low_regs.c scan-assembler-not [^f]r1[,0-3] FAIL: gcc.target/sh/attr-isr-nosave_low_regs.c scan-assembler-not [^f]r[0-9][ \t]*, FAIL: gcc.target/sh/attr-isr-trapa.c scan-assembler-not r1[,0-3] FAIL: gcc.target/sh/pr54680.c scan-assembler-times lds\t 6 FAIL: gcc.target/sh/pragma-isr-nosave_low_regs.c scan-assembler-not [^f]r1[,0-3] FAIL: gcc.target/sh/pragma-isr-nosave_low_regs.c scan-assembler-not [^f]r[0-9][ \t]*, FAIL: gcc.target/sh/pragma-isr-trapa.c scan-assembler-not r1[,0-3] FAIL: gcc.target/sh/pragma-isr-trapa2.c scan-assembler-not r1[,0-3] FAIL: gcc.target/sh/pragma-isr-trapa2.c scan-assembler-times [^_]fpscr 3 FAIL: gcc.target/sh/pragma-isr-trapa2.c scan-assembler-times r[0-7]\n 3 The failures seem to be related to ISR-like function handling or the tests need adjustments because they assume that fpscr is accessed in a particular way, which is now about to change. I'd like to ignore ISR function problems for now, and fix it as part of PR 57339.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #19 from Oleg Endo --- (In reply to Oleg Endo from comment #18) > (In reply to Oleg Endo from comment #15) > > Created attachment 33691 [details] > > a possible patch > > > > The previous patch was buggy, it always generated a PR toggle sequence, even > > if a mode should be set directly. > > I've tested the patch, there are some new failures: > ... No I didn't. That was a patch for PR 63260. Sorry for the noise.
[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #18 from Oleg Endo --- (In reply to Oleg Endo from comment #15) > Created attachment 33691 [details] > a possible patch > > The previous patch was buggy, it always generated a PR toggle sequence, even > if a mode should be set directly. I've tested the patch, there are some new failures: -m4 -mb: FAIL: g++.dg/torture/type-generic-1.C -O0 execution test FAIL: gcc.c-torture/execute/builtins/complex-1.c execution, -O0 FAIL: gcc.c-torture/execute/builtins/complex-1.c execution, -Og -g FAIL: gcc.c-torture/execute/complex-6.c -O0 execution test FAIL: gcc.c-torture/execute/gofast.c -O0 execution test FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution, -O0 FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution, -O1 FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution, -Og -g FAIL: gcc.c-torture/execute/ieee/20030331-1.c execution, -O0 FAIL: gcc.c-torture/execute/ieee/copysign1.c execution, -O0 FAIL: gcc.c-torture/execute/ieee/mzero3.c execution, -O0 FAIL: gcc.dg/torture/type-generic-1.c -O0 execution test -m4 -ml: FAIL: g++.dg/torture/type-generic-1.C -O0 execution test FAIL: g++.dg/torture/type-generic-1.C -O1 execution test FAIL: g++.dg/torture/type-generic-1.C -O2 execution test FAIL: g++.dg/torture/type-generic-1.C -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: g++.dg/torture/type-generic-1.C -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: g++.dg/torture/type-generic-1.C -O3 -fomit-frame-pointer execution test FAIL: g++.dg/torture/type-generic-1.C -O3 -g execution test FAIL: g++.dg/torture/type-generic-1.C -Os execution test FAIL: gcc.c-torture/execute/builtins/complex-1.c execution, -O0 FAIL: gcc.c-torture/execute/builtins/complex-1.c execution, -Og -g FAIL: gcc.c-torture/execute/complex-6.c -O0 execution test FAIL: gcc.c-torture/execute/conversion.c -O0 execution test FAIL: gcc.c-torture/execute/gofast.c -O0 execution test FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution, -O0 FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution, -O1 FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution, -Og -g FAIL: gcc.c-torture/execute/ieee/20030331-1.c execution, -O0 FAIL: gcc.c-torture/execute/ieee/copysign1.c execution, -O0 FAIL: gcc.c-torture/execute/ieee/mzero3.c execution, -O0 FAIL: gcc.dg/pr28796-2.c execution test FAIL: gcc.dg/torture/fp-int-convert-float.c -O0 execution test FAIL: gcc.dg/torture/fp-int-convert-float.c -O1 execution test FAIL: gcc.dg/torture/fp-int-convert-float.c -O2 execution test FAIL: gcc.dg/torture/fp-int-convert-float.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/fp-int-convert-float.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/torture/fp-int-convert-float.c -O3 -fomit-frame-pointer execution test FAIL: gcc.dg/torture/fp-int-convert-float.c -O3 -g execution test FAIL: gcc.dg/torture/fp-int-convert-float.c -Os execution test FAIL: gcc.dg/torture/type-generic-1.c -O0 execution test FAIL: gcc.dg/torture/type-generic-1.c -O1 execution test FAIL: gcc.dg/torture/type-generic-1.c -O2 execution test FAIL: gcc.dg/torture/type-generic-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/type-generic-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/torture/type-generic-1.c -O3 -fomit-frame-pointer execution test FAIL: gcc.dg/torture/type-generic-1.c -O3 -g execution test FAIL: gcc.dg/torture/type-generic-1.c -Os execution test FAIL: gcc.dg/torture/vec-cvt-1.c -O0 execution test FAIL: gcc.dg/torture/vec-cvt-1.c -O1 execution test FAIL: gcc.dg/torture/vec-cvt-1.c -O2 execution test FAIL: gcc.dg/torture/vec-cvt-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/vec-cvt-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/torture/vec-cvt-1.c -O3 -fomit-frame-pointer execution test FAIL: gcc.dg/torture/vec-cvt-1.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test FAIL: gcc.dg/torture/vec-cvt-1.c -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gcc.dg/torture/vec-cvt-1.c -O3 -g execution test FAIL: gcc.dg/torture/vec-cvt-1.c -Os execution test