[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-12-13 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #44 from Oleg Endo  ---
Author: olegendo
Date: Sat Dec 13 13:17:55 2014
New Revision: 218707

URL: https://gcc.gnu.org/viewcvs?rev=218707&root=gcc&view=rev
Log:
gcc/testsuite/
PR target/53513
* gcc.target/sh/attr-isr-nosave_low_regs.c: Fix matching of expected
register push/pop sequences.
* gcc.target/sh/attr-isr.c: Likewise.
* gcc.target/sh/attr-isr-trapa.c: Likewise.
* gcc.target/sh/pragma-isr-nosave_low_regs.c: Likewise.
* gcc.target/sh/pragma-isr-trapa.c: Likewise.
* gcc.target/sh/pragma-isr-trapa2.c: Likewise.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/sh/attr-isr-nosave_low_regs.c
trunk/gcc/testsuite/gcc.target/sh/attr-isr-trapa.c
trunk/gcc/testsuite/gcc.target/sh/attr-isr.c
trunk/gcc/testsuite/gcc.target/sh/pragma-isr-nosave_low_regs.c
trunk/gcc/testsuite/gcc.target/sh/pragma-isr-trapa.c
trunk/gcc/testsuite/gcc.target/sh/pragma-isr-trapa2.c


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-12-10 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #43 from Oleg Endo  ---
Author: olegendo
Date: Wed Dec 10 08:31:32 2014
New Revision: 218563

URL: https://gcc.gnu.org/viewcvs?rev=218563&root=gcc&view=rev
Log:
gcc/
PR target/53513
* doc/extend.texi (__builtin_sh_set_fpscr): Fix typo.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/doc/extend.texi


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-12-09 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #42 from Oleg Endo  ---
Author: olegendo
Date: Wed Dec 10 00:21:36 2014
New Revision: 218551

URL: https://gcc.gnu.org/viewcvs?rev=218551&root=gcc&view=rev
Log:
gcc/
PR target/53513
* doc/extend.texi (__builtin_sh_get_fpscr, __builtin_sh_get_fpscr):
Document it.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/doc/extend.texi


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-12-07 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #41 from Oleg Endo  ---
Author: olegendo
Date: Sun Dec  7 23:19:59 2014
New Revision: 218472

URL: https://gcc.gnu.org/viewcvs?rev=218472&root=gcc&view=rev
Log:
gcc/testsuite/
PR target/53513
* gcc.target/sh/pr54602-4.c: Fix matching of rte-nop sequence.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/sh/pr54602-4.c


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-18 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #40 from Oleg Endo  ---
Author: olegendo
Date: Sat Oct 18 10:51:08 2014
New Revision: 216424

URL: https://gcc.gnu.org/viewcvs?rev=216424&root=gcc&view=rev
Log:
gcc/
PR target/53513
* config/sh/sh-modes.def (PSI): Remove.
* config/sh/sh-protos.h (get_fpscr_rtx): Remove.
* config/sh/sh.c (fpscr_rtx, get_fpscr_rtx): Remove.
(sh_reorg): Remove commented out FPSCR code.
(fpscr_set_from_mem): Use SImode instead of PSImode.  Emit lds_fpscr
insn instead of move insn.
(sh_hard_regno_mode_ok): Return SImode for FPSCR.
(sh_legitimate_address_p, sh_legitimize_reload_address): Remove PSImode
handling.
(sh_emit_mode_set): Emit lds_fpscr and sts_fpscr insns.
(sh1_builtin_p): Uncomment.
(SH_BLTIN_UV 25, SH_BLTIN_VU 26): New macros.
(bdesc): Add __builtin_sh_get_fpscr and __builtin_sh_set_fpscr.
* config/sh/sh/predicates.md (fpscr_operand): Simplify.
(fpscr_movsrc_operand, fpscr_movdst_operand): New predicates.
(general_movsrc_operand, general_movdst_operand): Disallow
fpscr_operand.
* config/sh/sh.md (FPSCR_FR): New constant.
(push_fpscr): Emit sts_fpscr insn.
(pop_fpscr): Emit lds_fpscr_insn.
(movsi_ie): Disallow FPSCR operands.
(fpu_switch, unnamed related split, extend_psi_si,
truncate_si_psi): Remove insns.
(lds_fpscr, sts_fpscr): New insns.
(toggle_sz, toggle_pr): Use SImode for FPSCR_REG instead of PSImode.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/predicates.md
trunk/gcc/config/sh/sh-modes.def
trunk/gcc/config/sh/sh-protos.h
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-17 Thread kkojima at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #39 from Kazumoto Kojima  ---
(In reply to Oleg Endo from comment #38)
> ... there are no new failures for -m4 -ml and -m4 -mb.  I'm tempted to apply
> it.  Kaz, do you have any objections?

I have no objection.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-17 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #38 from Oleg Endo  ---
(In reply to Oleg Endo from comment #37)
> Created attachment 33751 [details]
> Use SImode for FPSCR, add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr
> 
> I'm now testing this patch.

... there are no new failures for -m4 -ml and -m4 -mb.  I'm tempted to apply
it.  Kaz, do you have any objections?


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-17 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

Oleg Endo  changed:

   What|Removed |Added

  Attachment #33745|0   |1
is obsolete||

--- Comment #37 from Oleg Endo  ---
Created attachment 33751
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33751&action=edit
Use SImode for FPSCR, add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr

I'm now testing this patch.  It removes The PSImode for FPSCR completely and
splits the fpu_switch insn into two separate lds_fpscr and sts_fpscr insns. 
This allows relaxing the insn dependencies.  Also now that sts_fpscr becomes a
single-set insn, it can be stuffed into delay slots.
I've also removed the alternatives for FPSCR <-> memory, except the pre-dec and
post-inc ones, which are needed for push and pop insns.  As a consequence, a
user initiated __builtin_sh_get_fpscr () to a memory will always be ferried
though a general register.  There is some room for improvement, but it goes
into the direction of address-mode-selection optimization, which can be done
later.

When doing a __builtin_sh_set_fpscr (value) the compiler will always insert
code to preserve the current FPSCR FR, SZ, PR mode bits.  This always involves
getting the current FPSCR into a general register first and then loading FPSCR
from a general register.  Thus we can omit FPSCR loads from memory for now.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-17 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #36 from Oleg Endo  ---
(In reply to Oleg Endo from comment #35)
> Created attachment 33745 [details]
> Use SImode for FPSCR, add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr
> 
> So I ended up removing the usage of PSImode for FPSCR and using SImode
> instead.  I've tested this patch on r216173, together with the already
> applied attachment 33727 [details] patch, with -m4 -ml and -m4 -mb.  There
> is one new failure:
> 
> FAIL: gcc.c-torture/execute/20021120-1.c   -O2 -flto -fuse-linker-plugin
> -fno-fat-lto-objects  (internal compiler error)
> 
> error: insn does not satisfy its constraints:
> (insn 1491 1489 1176 5 (set (reg:SI 12 r12)
> (plus:SI (reg:SI 13 r13)
> (const_int 24 [0x18]))) 20021120-1.c:39 76 {*addsi3_compact}
>  (nil))
> 
> 
> Summary:
> 
> insn 1173 stores the fpscr to r12
> 
> the mode switch (xor op) is done on r12
> 
> insn 1489 writes r12 (new fpscr value) on the stack
> 
> insn 1491 tries to recalculate the stack address and put it into r12, to
> satisfy the memory constraints of fpu_switch (which now accepts post-inc
> load and simple register address).
> 
> insn 1776 tries to load it from the stack
> 
> 
> Reload generates a wrong insn addsi3 insn.  Something similar was happening
> in PR 55212, too.

Disallowing memory operands (except pre-dec store, post-inc load for push,pop)
in the fpu_switch pattern fixes that failure.  Although it might result in less
optimal code in some cases where user code wants to store the current fpscr in
memory.  On the other hand, the generated memory address code in such cases is
not very good anyway.  Thus it's probably better to drop that for now.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-17 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #35 from Oleg Endo  ---
Created attachment 33745
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33745&action=edit
Use SImode for FPSCR, add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr

So I ended up removing the usage of PSImode for FPSCR and using SImode instead.
 I've tested this patch on r216173, together with the already applied
attachment 33727 patch, with -m4 -ml and -m4 -mb.  There is one new failure:

FAIL: gcc.c-torture/execute/20021120-1.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  (internal compiler error)

error: insn does not satisfy its constraints:
(insn 1491 1489 1176 5 (set (reg:SI 12 r12)
(plus:SI (reg:SI 13 r13)
(const_int 24 [0x18]))) 20021120-1.c:39 76 {*addsi3_compact}
 (nil))


20021120-1.c:58:1: internal compiler error: in reload_cse_simplify_operands, at
postreload.c:415
0x8565017 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../../gcc-trunk2/gcc/rtl-error.c:110
0x8565055 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../../gcc-trunk2/gcc/rtl-error.c:121
0x850c0cc reload_cse_simplify_operands
../../gcc-trunk2/gcc/postreload.c:415
0x850e5ea reload_cse_simplify
../../gcc-trunk2/gcc/postreload.c:127
0x850e5ea reload_cse_regs_1
../../gcc-trunk2/gcc/postreload.c:224
0x850e6a2 reload_cse_regs
../../gcc-trunk2/gcc/postreload.c:72
0x850e6a2 execute
../../gcc-trunk2/gcc/postreload.c:2348


At that place, reload it's trying to stuff with the fpu_switch insn:

(insn 293 1480 1173 5 (parallel [
(set (mem/c:DF (reg:SI 12 r12) [4 %sfp+-344 S8 A32])
(reg:DF 70 fr6))
(use (reg:SI 154 fpscr0))
(clobber (scratch:SI))
]) 20021120-1.c:39 289 {movdf_i4}
 (nil))
(insn 1173 293 1483 5 (parallel [
(set (reg:SI 12 r12)
(reg/v:SI 151 ))
(use (reg:SI 155 fpscr1))
(use (reg:SI 154 fpscr0))
(set (reg:SI 155 fpscr1)
(unspec_volatile:SI [
(const_int 0 [0])
] UNSPECV_FPSCR_STAT))
(set (reg:SI 154 fpscr0)
(unspec_volatile:SI [
(const_int 0 [0])
] UNSPECV_FPSCR_MODES))
]) 20021120-1.c:39 434 {fpu_switch}
 (nil))
(insn 1483 1173 1484 5 (set (reg:SI 13 r13)
(const_int 252 [0xfc])) 20021120-1.c:39 258 {movsi_ie}
 (nil))
(insn 1484 1483 1485 5 (set (reg:SI 13 r13)
(plus:SI (reg:SI 13 r13)
(reg/f:SI 15 r15))) 20021120-1.c:39 76 {*addsi3_compact}
 (expr_list:REG_EQUIV (plus:SI (reg/f:SI 15 r15)
(const_int 252 [0xfc]))
(nil)))
(insn 1485 1484 1174 5 (set (mem/c:SI (plus:SI (reg:SI 13 r13)
(const_int 24 [0x18])) [4 %sfp+-76 S4 A32])
(reg:SI 12 r12)) 20021120-1.c:39 258 {movsi_ie}
 (nil))
(note 1174 1485 1490 5 NOTE_INSN_DELETED)
(insn 1490 1174 1175 5 (set (reg:SI 13 r13)
(const_int 524288 [0x8])) 20021120-1.c:39 258 {movsi_ie}
 (nil))
(insn 1175 1490 1487 5 (set (reg:SI 12 r12)
(xor:SI (reg:SI 12 r12)
(reg:SI 13 r13))) 20021120-1.c:39 138 {*xorsi3_compact}
 (nil))
(insn 1487 1175 1488 5 (set (reg:SI 13 r13)
(const_int 252 [0xfc])) 20021120-1.c:39 258 {movsi_ie}
 (nil))
(insn 1488 1487 1489 5 (set (reg:SI 13 r13)
(plus:SI (reg:SI 13 r13)
(reg/f:SI 15 r15))) 20021120-1.c:39 76 {*addsi3_compact}
 (expr_list:REG_EQUIV (plus:SI (reg/f:SI 15 r15)
(const_int 252 [0xfc]))
(nil)))
(insn 1489 1488 1491 5 (set (mem/c:SI (plus:SI (reg:SI 13 r13)
(const_int 24 [0x18])) [4 %sfp+-76 S4 A32])
(reg:SI 12 r12)) 20021120-1.c:39 258 {movsi_ie}
 (nil))
(insn 1491 1489 1176 5 (set (reg:SI 12 r12)
(plus:SI (reg:SI 13 r13)
(const_int 24 [0x18]))) 20021120-1.c:39 76 {*addsi3_compact}
 (nil))
(insn 1176 1491 1496 5 (parallel [
(set (reg/v:SI 151 )
(mem/c:SI (reg:SI 12 r12) [4 %sfp+-76 S4 A32]))
(use (reg:SI 155 fpscr1))
(use (reg:SI 154 fpscr0))
(set (reg:SI 155 fpscr1)
(unspec_volatile:SI [
(const_int 0 [0])
] UNSPECV_FPSCR_STAT))
(set (reg:SI 154 fpscr0)
(unspec_volatile:SI [
(const_int 0 [0])
] UNSPECV_FPSCR_MODES))
]) 20021120-1.c:39 434 {fpu_switch}
 (nil))
(insn 1496 1176 1497 5 (set (reg:SI 12 r12)
(const_int 16 [0x10])) 20021120-1.c:39 258 {movsi_ie}
 (nil))


Summary:

insn 1173 stores the fpscr to r12

the mode switch (xor op) is done on r12

insn 1489 writes r12 (new fpscr value) on the stack

insn 1491 tries to recalculate the stack address and put it into r12, to
satisfy the memory constraints of fpu_switch (which now accepts post-inc load
and 

[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-16 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #34 from Oleg Endo  ---
Test run for 

  -m2e -ml, -m2a -mb, -m2a-single -mb, -m4-single-ml

has finished and shows no new failures (except the ISR stuff).


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-16 Thread kkojima at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #33 from Kazumoto Kojima  ---
(In reply to Oleg Endo from comment #32)
> I'd propose the following for SH:
> 
> unsigned int __builtin_sh_get_fpscr ()
> void __builtin_sh_set_fpscr (unsigned int)
> 
> Any opinions or feedback regarding the matter?

Looks fine to me.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-16 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #32 from Oleg Endo  ---
As a next step, I'd like to add built-in functions to get/set the FPSCR on SH.

We have the old __get_fpscr and __set_fpscr functions in libgcc which modify
the __fpscr_values array and the FPSCR.

One idea would be to make __get_fpscr and __set_fpscr built-in functions, so
that new code never generates the respective library calls.  I don't know how
expected/unexpected that would be for users and their existing code.
For example, the current glibc does this in sysdeps/sh/sh4/fpu/fpu_control.h:

#if defined __GNUC__
__BEGIN_DECLS

/* GCC provides this function.  */
extern void __set_fpscr (unsigned long);
#define _FPU_SETCW(cw) __set_fpscr ((cw))
#else
#define _FPU_SETCW(cw) __asm__ ("lds %0,fpscr" : : "r" (cw))
#endif

By making __set_fpscr a builtin it would automagically work, but would never
update the __fpscr_values.  I'm not sure what kind of other fpscr related work
arounds build on top of that.

Alternatively, we could name the built-in functions differently.  I've briefly
checked the docs of other targets, this is what I've found:

aarch64:
 unsigned int __builtin_aarch64_get_fpcr ()
 void __builtin_aarch64_set_fpcr (unsigned int)
 unsigned int __builtin_aarch64_get_fpsr ()
 void __builtin_aarch64_set_fpsr (unsigned int)

ARM:
unsigned int __builtin_arm_get_fpscr ()
 void __builtin_arm_set_fpscr (unsigned int)

MIPS:
unsigned int __builtin_mips_get_fcsr (void)
void __builtin_mips_set_fcsr (unsigned int value)

I'd propose the following for SH:

unsigned int __builtin_sh_get_fpscr ()
void __builtin_sh_set_fpscr (unsigned int)

Any opinions or feedback regarding the matter?


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-16 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #31 from Oleg Endo  ---
Author: olegendo
Date: Thu Oct 16 10:58:36 2014
New Revision: 216307

URL: https://gcc.gnu.org/viewcvs?rev=216307&root=gcc&view=rev
Log:
gcc/
PR target/53513
* config/sh/sh-protos.h (emit_sf_insn, emit_df_insn, expand_sf_unop,
expand_sf_binop, expand_df_unop, expand_df_binop): Remove.

* config/sh/sh.c (sh_emit_set_t_insn): Adjust generated insn pattern
to match fp insn patterns.
(calc_live_regs): Add FPSCR_MODES_REG and FPSCR_STAT_REG to the ignore
list.
(emit_sf_insn, emit_df_insn, expand_sf_unop, expand_sf_binop,
expand_df_unop, expand_df_binop): Remove.
(sh_conditional_register_usage): Mark FPSCR_MODES_REG and
FPSCR_STAT_REG as not call clobbered.
(sh_emit_mode_set): Emit fpscr store-modify-load sequence instead of
invoking fpscr_set_from_mem.

* config/sh/sh.h (MAX_REGISTER_NAME_LENGTH): Increase to 6.
(SH_REGISTER_NAMES_INITIALIZER): Add names for FPSCR_MODES_REG and
FPSCR_STAT_REG.
(REGISTER_NAMES): Adjust.
(SPECIAL_REGISTER_P): Add FPSCR_MODES_REG and FPSCR_STAT_REG.
(FIRST_PSEUDO_REGISTER): Increase to 156.
(DWARF_FRAME_REGISTERS): Define as 153 to keep the original value.
(FIXED_REGISTERS, CALL_USED_REGISTERS): Add FPSCR_MODES_REG and
FPSCR_STAT_REG.
(REG_CLASS_CONTENTS): Adjust ALL_REGS bit mask to include
FPSCR_MODES_REG and FPSCR_STAT_REG.
(REG_ALLOC_ORDER): Add FPSCR_MODES_REG and FPSCR_STAT_REG.

* config/sh/sh.md (FPSCR_MODES_REG, FPSCR_STAT_REG, FPSCR_PR,
FPSCR_SZ): Add new constants.
(UNSPECV_FPSCR_MODES, UNSPECV_FPSCR_STAT): Add new unspecv constants.

(movpsi): Use TARGET_FPU_ANY condition, invoke gen_fpu_switch.
(fpu_switch): Add use and set of FPSCR_STAT_REG and FPSCR_MODES_REG.
Use TARGET_FPU_ANY condition.
(fpu_switch peephole2): Remove.
(fpu_switch split): Use simple_mem_operand to capture the mem and
adjust split implementation.
(extend_psi_si, truncate_si_psi): New insns.
(toggle_sz, toggle_pr): Use FPSCR_SZ, FPSCR_PR constants.  Add
set of FPSCR_MODES_REG.

(push_e, push_4, pop_e, pop_4, movdf_i4, reload_indf__frn, movsf_ie,
reload_insf__frn, force_mode_for_call, calli, calli_tbr_rel,
calli_pcrel, call_pcrel, call_compact, call_compact_rettramp,
call_valuei, call_valuei_tbr_rel, call_valuei_pcrel, call_value_pcrel,
call_value_compact, call_value_compact_rettramp, call,
call_pop_compact, call_pop_compact_rettramp, call_value, sibcalli,
sibcalli_pcrel, sibcalli_thunk, sibcall_pcrel, sibcall_compact,
sibcall, sibcall_valuei, sibcall_valuei_pcrel, sibcall_value_pcrel,
sibcall_value_compact, sibcall_value, call_value_pop_compact,
call_value_pop_compact_rettramp, various unnamed splits):
Replace use of FPSCR_REG with use of FPSCR_MODES_REG.  Adjust gen_*
function uses.

(floatsisf2_i4, *floatsisf2_ie): Merge into floatsisf2_i4.
(fix_truncsfsi2_i4, *fixsfsi): Merge into fix_truncsfsi2_i4.
(cmpgtsf_t, cmpgtsf_t_i4): Merge into cmpgtsf_t.
(cmpeqsf_t, cmpeqsf_t_i4): Merge into cmpeqsf_t.
(ieee_ccmpeqsf_t, *ieee_ccmpeqsf_t_4): Merge into ieee_ccmpeqsf_t.

(udivsi3_i4, divsi3_i4, addsf3_i, subsf3_i, mulsf3_i, fmasf4_i,
*fmasf4, divsf3_i, floatsisf2_i4, fix_truncsfsi2_i4, cmpgtsf_t,
cmpeqsf_t, ieee_ccmpeqsf_t, sqrtsf2_i, rsqrtsf2, fsca, adddf3_i,
subdf3_i, muldf3_i, divdf3_i, floatsidf2_i, fix_truncdfsi2_i,
cmpgtdf_t, cmpeqdf_t, *ieee_ccmpeqdf_t, sqrtdf2_i, extendsfdf2_i4,
truncdfsf2_i4): Replace use of FPSCR_REG with clobber of FPSCR_STAT_REG
and use of FPSCR_MODES_REG.  Adjust gen_* function uses.

gcc/testsuite/
PR target/53513
* gcc.target/sh/pr54680.c: Adjust matching of lds insn.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh-protos.h
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.h
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/sh/pr54680.c


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-15 Thread kkojima at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #30 from Kazumoto Kojima  ---
(In reply to Oleg Endo from comment #29)
> With this patch there are no new failures for -m4 -ml and -m4 -mb here,
> except the ISR failures.  I'll also test it for the other combinations
> later.  I'd like to get the current sh4 working version first.  Then fix
> other niche problems with ISRs or SH2E.  Kaz, what do you think?

Sounds reasonable.  Please go ahead.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-15 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #29 from Oleg Endo  ---
(In reply to Oleg Endo from comment #28)
> Created attachment 33727 [details]
> Using virtual FPSCR registers to model insn dependencies
> 
> The problem is the define_split and the peephole2 patterns below the
> "fpu_switch" insn.  I don't know how/if that was working before.  I've
> removed the peephole2 pattern and rewrote the split pattern, which fixes the
> failure above.  I'll re-test the whole thing again.

With this patch there are no new failures for -m4 -ml and -m4 -mb here, except
the ISR failures.  I'll also test it for the other combinations later.  I'd
like to get the current sh4 working version first.  Then fix other niche
problems with ISRs or SH2E.  Kaz, what do you think?


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-15 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

Oleg Endo  changed:

   What|Removed |Added

  Attachment #33721|0   |1
is obsolete||

--- Comment #28 from Oleg Endo  ---
Created attachment 33727
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33727&action=edit
Using virtual FPSCR registers to model insn dependencies

(In reply to Oleg Endo from comment #27)
> Created attachment 33721 [details]
> Using virtual FPSCR registers to model insn dependencies
> 
> Updated patch that avoids the single_set problems by using (clobber (reg:SI
> FPSCR_STAT_REG)) instead of a set.  This also eliminates the fsca pattern
> changes in the previous patch.  Since the 'fpu_switch' insn is still a
> multiple set insn, it won't be used for delay slot stuffing, but this is a
> minor issue that can be addressed later.  I'm testing the patch now on
> sh-sim.  At least 'make all' works.

Testing for '-m4 -ml' and '-m4 -mb' shows one new failure (ignoring the ISR
failures):

FAIL: gcc.c-torture/execute/pr28982a.c   -O1  (internal compiler error)
FAIL: gcc.c-torture/execute/pr28982a.c   -O1  (test for excess errors)

The problem is the define_split and the peephole2 patterns below the
"fpu_switch" insn.  I don't know how/if that was working before.  I've removed
the peephole2 pattern and rewrote the split pattern, which fixes the failure
above.  I'll re-test the whole thing again.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-15 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

Oleg Endo  changed:

   What|Removed |Added

  Attachment #33717|0   |1
is obsolete||

--- Comment #27 from Oleg Endo  ---
Created attachment 33721
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33721&action=edit
Using virtual FPSCR registers to model insn dependencies

Updated patch that avoids the single_set problems by using (clobber (reg:SI
FPSCR_STAT_REG)) instead of a set.  This also eliminates the fsca pattern
changes in the previous patch.  Since the 'fpu_switch' insn is still a multiple
set insn, it won't be used for delay slot stuffing, but this is a minor issue
that can be addressed later.  I'm testing the patch now on sh-sim.  At least
'make all' works.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-15 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #26 from Oleg Endo  ---
(In reply to Kazumoto Kojima from comment #25)
> (In reply to Oleg Endo from comment #23)
> > Kaz, could you please have an early look at it?
> 
> The idea looks OK to me.  Build fails on sh4-linux with the patch, though.
> Maybe a wrong version?
> There is a typo divdf3 expand.  s/gen_divdf3 /gen_divdf3_i /
> 
> -  expand_df_binop (&gen_divdf3_i, operands);
> +  emit_insn (gen_divdf3 (operands[0], operands[1], operands[2]));
> 
> With fixing it, yet segfaults during compiling libgcc2.c __powidf2 at

No, that was the correct version unfortunately.  And that was my
infinite-memory problem.  Thanks for spotting it.

> 
> #0  0x089706b4 in sh_adjust_cost (insn=0xb7f767e0, link=0xb7f79380, 
> dep_insn=0xb7f762d0, cost=25) at trunk/gcc/config/sh/sh.c:10908
> 10908 SET_SRC (use_pat)))
> (gdb) l
> 10903cycle earlier.  */
> 10904 else if (reload_completed
> 10905  && get_attr_dfp_comp (dep_insn) == DFP_COMP_YES
> 10906  && (use_pat = single_set (insn))
> 10907  && ! regno_use_in (REGNO (SET_DEST (single_set
> (dep_insn))),
> 10908 SET_SRC (use_pat)))
> 
> It looks
> 
> -   (use (match_operand:PSI 3 "fpscr_operand" "c"))]
> +   (set (reg:SI FPSCR_STAT_REG) (const_int 0))
> +   (use (reg:SI FPSCR_MODES_REG))]
> 
> breaks the assumption of sh_adjust_cost which divdf3_i has only one
> set insn.

Yep, single_set now returns null for basically all fp insns and it's not
checked there.  The default implementation of single_set is also the reason why
delay slot stuffing is not working on fp insns anymore.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-14 Thread kkojima at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #25 from Kazumoto Kojima  ---
(In reply to Oleg Endo from comment #23)
> Kaz, could you please have an early look at it?

The idea looks OK to me.  Build fails on sh4-linux with the patch, though.
Maybe a wrong version?
There is a typo divdf3 expand.  s/gen_divdf3 /gen_divdf3_i /

-  expand_df_binop (&gen_divdf3_i, operands);
+  emit_insn (gen_divdf3 (operands[0], operands[1], operands[2]));

With fixing it, yet segfaults during compiling libgcc2.c __powidf2 at

#0  0x089706b4 in sh_adjust_cost (insn=0xb7f767e0, link=0xb7f79380, 
dep_insn=0xb7f762d0, cost=25) at trunk/gcc/config/sh/sh.c:10908
10908 SET_SRC (use_pat)))
(gdb) l
10903cycle earlier.  */
10904 else if (reload_completed
10905  && get_attr_dfp_comp (dep_insn) == DFP_COMP_YES
10906  && (use_pat = single_set (insn))
10907  && ! regno_use_in (REGNO (SET_DEST (single_set
(dep_insn))),
10908 SET_SRC (use_pat)))

It looks

-   (use (match_operand:PSI 3 "fpscr_operand" "c"))]
+   (set (reg:SI FPSCR_STAT_REG) (const_int 0))
+   (use (reg:SI FPSCR_MODES_REG))]

breaks the assumption of sh_adjust_cost which divdf3_i has only one
set insn.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-14 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

Oleg Endo  changed:

   What|Removed |Added

  Attachment #33716|0   |1
is obsolete||

--- Comment #24 from Oleg Endo  ---
Created attachment 33717
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33717&action=edit
Using virtual FPSCR registers to model insn dependencies

sh_emit_set_t_insn was producing nested parallel patterns.

When compiling libgcc (for -m4 -mb) memory consumption goes up a lot.  Either
there is a bug in the patch, or it's triggering a bug somewhere else which
seems to cause an infinite alloc cycle somewhere.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-14 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

Oleg Endo  changed:

   What|Removed |Added

  Attachment #33691|0   |1
is obsolete||

--- Comment #23 from Oleg Endo  ---
Created attachment 33716
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33716&action=edit
Using virtual FPSCR registers to model insn dependencies

This is a somewhat larger patch which does a couple of things to address the
deficits of the previous approach.

1) The hard reg set is extended by two more registers:
- FPSCR_MODES_REG represents any mode bits in FPSCR, which are used by fp
insns.

- FPSCR_STAT_REG represents the status bits in FPSCR, which are written by fp
insns.

All fp insns that previously had a
  (use (match_operand:PSI 2 "fpscr_operand" "")

now get a
  (use (reg:SI FPSCR_MODES_REG))


2) The 'fpu_switch' insn, which is just a FPSCR load/store, now uses and sets
the virtual regs FPSCR_MODES_REG and FPSCR_STAT_REG.  This creates a sort of
reordering barrier due to the reg uses/sets.


3) Two new dummy insns extend_psi_si and truncate_si_psi are added to convert
PSIMode <-> SImode, because we can do only logic on SImode, but FPSCR is
described as PSImode.  This hack could be removed later maybe.


4) The insns toggle_sz and toggle_pr are now also setting the
virtual reg FPSCR_MODES_REG.


5) The fsca insn needed some adjustments for the operand matching, as combine
started going different paths.  Some of the fsca tests in gcc.target/sh were
failing and are fixed by this.


6) sh_emit_set_t_insn is adjusted to emit the correct comparison insns with
use/set of virtual FPSCR regs.


7) calc_live_regs is adjusted to ignore virtual regs FPSCR_MODES_REG and
FPSCR_STAT_REG, or else it will try to generate push/pop insns for those regs.


8) sh_emit_mode_set now emits the FPSCR store-modify-load insn sequence instead
of loading FPSCR from __fpscr_values.


9) Some redundant fp insns and related functions are deleted.


I've tried to keep the patch as short as possible and not do too many unrelated
changes.  I haven't fully tested the patch, so there are probably a couple of
fallouts.

There are two regressions with this patch.  The FPSCR loads are not put into
delay slots anymore.  Probably because the 'fpu_switch' pattern now has too
many sets and the DBR rejects that as a slot candidate.  The other thing are
the failures in gcc.target/sh w.r.t. interrupt functions, which can be
addressed later.

Kaz, could you please have an early look at it?


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-14 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #22 from Oleg Endo  ---
(In reply to Oleg Endo from comment #21)
> (In reply to Oleg Endo from comment #17)
> > 
> > In the 'addsf3_i' pattern, I've tried replacing the
> > 
> > (use (match_operand:PSI 3 "fpscr_operand" "c"))
> > 
> > with
> > 
> > (set (match_operand:PSI 3 "fpscr_operand" "=&c")
> >  (unspec:PSI [(match_dup 3)] UNSPEC_FPSCR_SET))]
> > 
> > 
> > I haven't checked all the other side effects it could have, but at least the
> > FMA combine patterns still seem work after that change.
> 
> With those changes above applied to all the other FP insns, the FMA tests in
> gcc.target/sh fail.

As well as some of the FSCA tests.  E.g. 'return sinf (x) + cosf (x)' would
expand into two sincos patterns and normally combine would fold them into one. 
However, it doesn't understand the unspec:PSI stuff and tries to nest it such
as:
Failed to match this instruction:
(parallel [
(set (reg:SI 174)
(fix:SI (reg:SF 167)))
(set (reg/v:PSI 151 )
(unspec:PSI [
(unspec:PSI [
(reg/v:PSI 151 )
] UNSPEC_FPSCR)
] UNSPEC_FPSCR))
])

I've tried adding a predicate that recursively matches those nested unspec:PSI
things, which makes combine happy, but then reload would bail out for some
reason.  Moreover, fp insns that do this

 (set (match_operand:PSI 3 "fpscr_operand" "=&c")
  (unspec:PSI [(match_dup 3)] UNSPEC_FPSCR_SET))]

won't get eliminated even if their results are unused/dead.
I'm trying to come up with some alternative approach.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-13 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #21 from Oleg Endo  ---
(In reply to Oleg Endo from comment #17)
> 
> In the 'addsf3_i' pattern, I've tried replacing the
> 
> (use (match_operand:PSI 3 "fpscr_operand" "c"))
> 
> with
> 
> (set (match_operand:PSI 3 "fpscr_operand" "=&c")
>  (unspec:PSI [(match_dup 3)] UNSPEC_FPSCR_SET))]
> 
> 
> I haven't checked all the other side effects it could have, but at least the
> FMA combine patterns still seem work after that change.

With those changes above applied to all the other FP insns, the FMA tests in
gcc.target/sh fail.

One problem is that at -O1, FMA patterns won't be expanded initially, but are
rather formed during combine.  With the more complex FPSCA set/use combine
won't match the patterns at -O1 but only at -O2.  This is regression is
probably acceptable.

The other issue is that the fix for PR 56547 doesn't work anymore, because for
some reason combine can't figure out what to do with the 1.0 constant. 
Probably because now it'd need to deal with multiple-set insns, which it
doesn't do well.  Adding a combine bridge pattern doesn't make it go further,
as it won't try to combine 'fpreg = fpreg + 1.0' and 'fpreg = fpreg * fpreg'. 
In this case it's probably better to do a manual fma combine in the split1
pass.  This is more effort, but would also fix the first problem.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-13 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #20 from Oleg Endo  ---
(In reply to Oleg Endo from comment #19)
> 
> No I didn't.  That was a patch for PR 63260.  Sorry for the noise.


Now I have.  For both '-m4 -ml' and '-m4 -mb' there are a few new failures:

FAIL: gcc.dg/attr-isr.c scan-assembler-times [^f]r[0-9][ \t]*, 8
(SH specific test 'gcc.dg/attr-isr.c' should be moved to 'gcc.target/sh')

FAIL: gcc.target/sh/attr-isr-nosave_low_regs.c scan-assembler-not [^f]r1[,0-3]
FAIL: gcc.target/sh/attr-isr-nosave_low_regs.c scan-assembler-not [^f]r[0-9][
\t]*,
FAIL: gcc.target/sh/attr-isr-trapa.c scan-assembler-not r1[,0-3]
FAIL: gcc.target/sh/pr54680.c scan-assembler-times lds\t 6
FAIL: gcc.target/sh/pragma-isr-nosave_low_regs.c scan-assembler-not
[^f]r1[,0-3]
FAIL: gcc.target/sh/pragma-isr-nosave_low_regs.c scan-assembler-not [^f]r[0-9][
\t]*,
FAIL: gcc.target/sh/pragma-isr-trapa.c scan-assembler-not r1[,0-3]
FAIL: gcc.target/sh/pragma-isr-trapa2.c scan-assembler-not r1[,0-3]
FAIL: gcc.target/sh/pragma-isr-trapa2.c scan-assembler-times [^_]fpscr 3
FAIL: gcc.target/sh/pragma-isr-trapa2.c scan-assembler-times r[0-7]\n 3


The failures seem to be related to ISR-like function handling or the tests need
adjustments because they assume that fpscr is accessed in a particular way,
which is now about to change.

I'd like to ignore ISR function problems for now, and fix it as part of PR
57339.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-12 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #19 from Oleg Endo  ---
(In reply to Oleg Endo from comment #18)
> (In reply to Oleg Endo from comment #15)
> > Created attachment 33691 [details]
> > a possible patch
> > 
> > The previous patch was buggy, it always generated a PR toggle sequence, even
> > if a mode should be set directly.
> 
> I've tested the patch, there are some new failures:
> ...

No I didn't.  That was a patch for PR 63260.  Sorry for the noise.


[Bug target/53513] [SH] Add support for fschg and fpchg insns and improve fenv support

2014-10-12 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

--- Comment #18 from Oleg Endo  ---
(In reply to Oleg Endo from comment #15)
> Created attachment 33691 [details]
> a possible patch
> 
> The previous patch was buggy, it always generated a PR toggle sequence, even
> if a mode should be set directly.

I've tested the patch, there are some new failures:

-m4 -mb:
FAIL: g++.dg/torture/type-generic-1.C   -O0  execution test
FAIL: gcc.c-torture/execute/builtins/complex-1.c execution,  -O0 
FAIL: gcc.c-torture/execute/builtins/complex-1.c execution,  -Og -g 
FAIL: gcc.c-torture/execute/complex-6.c   -O0  execution test
FAIL: gcc.c-torture/execute/gofast.c   -O0  execution test
FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution,  -O0 
FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution,  -O1 
FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution,  -Og -g 
FAIL: gcc.c-torture/execute/ieee/20030331-1.c execution,  -O0 
FAIL: gcc.c-torture/execute/ieee/copysign1.c execution,  -O0 
FAIL: gcc.c-torture/execute/ieee/mzero3.c execution,  -O0 
FAIL: gcc.dg/torture/type-generic-1.c   -O0  execution test

-m4 -ml:
FAIL: g++.dg/torture/type-generic-1.C   -O0  execution test
FAIL: g++.dg/torture/type-generic-1.C   -O1  execution test
FAIL: g++.dg/torture/type-generic-1.C   -O2  execution test
FAIL: g++.dg/torture/type-generic-1.C   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: g++.dg/torture/type-generic-1.C   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: g++.dg/torture/type-generic-1.C   -O3 -fomit-frame-pointer  execution
test
FAIL: g++.dg/torture/type-generic-1.C   -O3 -g  execution test
FAIL: g++.dg/torture/type-generic-1.C   -Os  execution test
FAIL: gcc.c-torture/execute/builtins/complex-1.c execution,  -O0 
FAIL: gcc.c-torture/execute/builtins/complex-1.c execution,  -Og -g 
FAIL: gcc.c-torture/execute/complex-6.c   -O0  execution test
FAIL: gcc.c-torture/execute/conversion.c   -O0  execution test
FAIL: gcc.c-torture/execute/gofast.c   -O0  execution test
FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution,  -O0 
FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution,  -O1 
FAIL: gcc.c-torture/execute/ieee/20010114-2.c execution,  -Og -g 
FAIL: gcc.c-torture/execute/ieee/20030331-1.c execution,  -O0 
FAIL: gcc.c-torture/execute/ieee/copysign1.c execution,  -O0 
FAIL: gcc.c-torture/execute/ieee/mzero3.c execution,  -O0 
FAIL: gcc.dg/pr28796-2.c execution test
FAIL: gcc.dg/torture/fp-int-convert-float.c   -O0  execution test
FAIL: gcc.dg/torture/fp-int-convert-float.c   -O1  execution test
FAIL: gcc.dg/torture/fp-int-convert-float.c   -O2  execution test
FAIL: gcc.dg/torture/fp-int-convert-float.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.dg/torture/fp-int-convert-float.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-float.c   -O3 -fomit-frame-pointer 
execution test
FAIL: gcc.dg/torture/fp-int-convert-float.c   -O3 -g  execution test
FAIL: gcc.dg/torture/fp-int-convert-float.c   -Os  execution test
FAIL: gcc.dg/torture/type-generic-1.c   -O0  execution test
FAIL: gcc.dg/torture/type-generic-1.c   -O1  execution test
FAIL: gcc.dg/torture/type-generic-1.c   -O2  execution test
FAIL: gcc.dg/torture/type-generic-1.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.dg/torture/type-generic-1.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/type-generic-1.c   -O3 -fomit-frame-pointer  execution
test
FAIL: gcc.dg/torture/type-generic-1.c   -O3 -g  execution test
FAIL: gcc.dg/torture/type-generic-1.c   -Os  execution test
FAIL: gcc.dg/torture/vec-cvt-1.c   -O0  execution test
FAIL: gcc.dg/torture/vec-cvt-1.c   -O1  execution test
FAIL: gcc.dg/torture/vec-cvt-1.c   -O2  execution test
FAIL: gcc.dg/torture/vec-cvt-1.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.dg/torture/vec-cvt-1.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/vec-cvt-1.c   -O3 -fomit-frame-pointer  execution test
FAIL: gcc.dg/torture/vec-cvt-1.c   -O3 -fomit-frame-pointer -funroll-all-loops
-finline-functions  execution test
FAIL: gcc.dg/torture/vec-cvt-1.c   -O3 -fomit-frame-pointer -funroll-loops 
execution test
FAIL: gcc.dg/torture/vec-cvt-1.c   -O3 -g  execution test
FAIL: gcc.dg/torture/vec-cvt-1.c   -Os  execution test