[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2016-09-27 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #88 from Oleg Endo  ---
Author: olegendo
Date: Tue Sep 27 12:50:27 2016
New Revision: 240533

URL: https://gcc.gnu.org/viewcvs?rev=240533&root=gcc&view=rev
Log:
gcc/
PR target/51244
* config/sh/sh.c (sh_rtx_costs): Fix return value of SET of movt and
movrt patterns.  Match them before anything else in the SET case.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c

[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2016-09-25 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #87 from Oleg Endo  ---
Author: olegendo
Date: Sun Sep 25 06:59:37 2016
New Revision: 240471

URL: https://gcc.gnu.org/viewcvs?rev=240471&root=gcc&view=rev
Log:
This fixes a fallout that actually goes back to 5.0 but went unnoticed.
The costs for movt and movrt type of insns were not correctly reported
and ifcvt thus made some bad choices for SH.  A new cset_zero pattern
variant is also required to fix the matching for some recent changes
in the middle end.

gcc/
PR target/51244
* config/sh/sh.c (sh_movt_set_dest, sh_movrt_set_dest): Add overloads.
(sh_rtx_costs): Handle SET of movt and movrt patterns.
* cnofig/sh/sh-protos.h (sh_movt_set_dest, sh_movrt_set_dest): Forward
declare new overloads.
* config/sh/sh.md (*cset_zero): Add variant that takes a treg_set_expr
operand.

gcc/testsuite/
PR target/51244
* gcc.target/sh/pr51244-11.c: Add more detailed expected insn matching.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh-protos.h
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/sh/pr51244-11.c

[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2015-03-01 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

Oleg Endo  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #86 from Oleg Endo  ---
I'd like to close this PR as fixed because it's getting too long.  I'll try to
pull out the remaining issues into individual new PRs.


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2015-01-24 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #85 from Oleg Endo  ---
Author: olegendo
Date: Sat Jan 24 13:04:53 2015
New Revision: 220081

URL: https://gcc.gnu.org/viewcvs?rev=220081&root=gcc&view=rev
Log:
gcc/
PR target/49263
PR target/53987
PR target/64345
PR target/59533
PR target/52933
PR target/54236
PR target/51244
* config/sh/sh-protos.h
(sh_extending_set_of_reg::can_use_as_unextended_reg,
sh_extending_set_of_reg::use_as_unextended_reg,
sh_is_nott_insn, sh_movt_set_dest, sh_movrt_set_dest, sh_is_movt_insn,
sh_is_movrt_insn, sh_insn_operands_modified_between_p,
sh_reg_dead_or_unused_after_insn, sh_in_recog_treg_set_expr,
sh_recog_treg_set_expr, sh_split_treg_set_expr): New functions.
(sh_treg_insns): New class.
* config/sh/sh.c (TARGET_LEGITIMATE_COMBINED_INSN): Define target hook.
(scope_counter): New class.
(sh_legitimate_combined_insn, sh_is_nott_insn, sh_movt_set_dest,
sh_movrt_set_dest, sh_reg_dead_or_unused_after_insn,
sh_extending_set_of_reg::can_use_as_unextended_reg,
sh_extending_set_of_reg::use_as_unextended_reg, sh_recog_treg_set_expr,
sh_in_recog_treg_set_expr, sh_try_split_insn_simple,
sh_split_treg_set_expr): New functions.
(addsubcosts): Handle treg_set_expr.
(sh_rtx_costs): Handle IF_THEN_ELSE and ZERO_EXTRACT.
(sh_rtx_costs): Use arith_reg_operand in SIGN_EXTEND and ZERO_EXTEND.
(sh_rtx_costs): Handle additional bit test patterns in EQ and AND cases.
(sh_insn_operands_modified_between_p): Make non-static.
* config/sh/predicates.md (zero_extend_movu_operand): Allow
simple_mem_operand in addition to displacement_mem_operand.
(zero_extend_operand): Don't allow zero_extend_movu_operand.
(treg_set_expr, treg_set_expr_not_const01,
arith_reg_or_treg_set_expr): New predicates.
* config/sh/sh.md (tstsi_t): Use arith_reg_operand and
arith_or_int_operand instead of logical_operand.  Convert to
insn_and_split.  Try to optimize constant operand in splitter.
(tsthi_t, tstqi_t): Fold into *tst_t.  Convert to insn_and_split.
(*tstqi_t_zero): Delete.
(*tst_t_subregs): Add !sh_in_recog_treg_set_expr split condition.
(tstsi_t_and_not): Delete.
(tst_t_zero_extract_eq): Rename to *tst_t_zero_extract.
Convert to insn_and_split.
(unnamed split, tstsi_t_zero_extract_xor,
tstsi_t_zero_extract_subreg_xor_little,
tstsi_t_zero_extract_subreg_xor_big): Delete.
(*tstsi_t_shift_mask): New insn_and_split.
(cmpeqsi_t, cmpgesi_t): Add new split for const_int 0 operands and try
to recombine with surrounding insns when splitting.
(*negtstsi): Add !sh_in_recog_treg_set_expr condition.
(cmp_div0s_0, cmp_div0s_1, *cmp_div0s_0, *cmp_div0s_1): Rewrite as ...
(cmp_div0s, *cmp_div0s_1, *cmp_div0s_2, *cmp_div0s_3, *cmp_div0s_4,
*cmp_div0s_5, *cmp_div0s_6): ... these new insn_and_split patterns.
(*cbranch_div0s: Delete.
(*addc): Convert to insn_and_split.  Use treg_set_expr as 3rd operand.
Try to recombine with surrounding insns when splitting.  Add operand
order variants.
(*addc_t_r, *addc_r_t): Use treg_set_expr_not_const01.
(*addc_r_r_1, *addc_r_lsb, *addc_r_r_lsb, *addc_r_lsb_r, *addc_r_msb,
*addc_r_r_msb, *addc_2r_msb): Delete.
(*addc_2r_lsb): Rename to *addc_2r_t.  Use treg_set_expr.  Add operand
order variant.
(*addc_negreg_t): New insn_and_split.
(*subc): Convert to insn_and_split.  Use treg_set_expr as 3rd operand.
Try to recombine with surrounding insns when splitting.
Add operand order variants.  
(*subc_negt_reg, *subc_negreg_t, *reg_lsb_t, *reg_msb_t): New
insn_and_split patterns.
(*rotcr): Use arith_reg_or_treg_set_expr.  Try to recombine with
surrounding insns when splitting.
(unnamed rotcr split): Use arith_reg_or_treg_set_expr.
(*rotcl): Likewise.  Add zero_extract variant.
(*ashrsi2_31): New insn_and_split.
(*negc): Convert to insn_and_split.  Use treg_set_expr.
(*zero_extendsi2_disp_mem): Update comment.
(movrt_negc, *movrt_negc, nott): Add !sh_in_recog_treg_set_expr split
condition.
(*mov_t_msb_neg, mov_neg_si_t): Use treg_set_expr.  Try to recombine
with surrounding insns when splitting.
(any_treg_expr_to_reg): New insn_and_split.
(*neg_zero_extract_0, *neg_zero_extract_1, *neg_zero_extract_2,
*neg_zero_extract_3, *neg_zero_extract_4, *neg_zero_extract_5,
*neg_zero_extract_6, *zero_extract_0, *zero_extract_1,
*zero_extract_2): New single bit zero extract patterns.
(bld_reg, *bld_regqi): Fold into bld_reg.
(*get_thread_pointersi, store_gbr, *mov_gbr_load,
*mov_gbr_load, *mov_gbr_load, *mov_gbr_load,
*movdi_gbr_load): Use arith_reg_dest instead of register_operand for
set destination.
(set_thread_pointersi, load_gbr): Use arith_reg_operand instead of
register_operand for set source.

gcc/testsuite/
PR target/49263
PR target/53987

[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2014-12-24 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #84 from Oleg Endo  ---
Author: olegendo
Date: Wed Dec 24 21:55:59 2014
New Revision: 219062

URL: https://gcc.gnu.org/viewcvs?rev=219062&root=gcc&view=rev
Log:
gcc/
PR target/51244
* config/sh/sh.md (*mov_t_msb_neg): Convert split into insn_and_split.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2014-12-17 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #83 from Oleg Endo  ---
(In reply to Oleg Endo from comment #71)
> 
> * The RTL pass does the treg combine only when there is a conditional
> branch.  It should also handle conditional move insns (-mpretend-cmove).
> 

It does now.  It also handles nott cbranch sequences by inverting the branch
condition and deleting the nott insn.


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2014-12-17 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #82 from Oleg Endo  ---
Author: olegendo
Date: Wed Dec 17 23:08:14 2014
New Revision: 218850

URL: https://gcc.gnu.org/viewcvs?rev=218850&root=gcc&view=rev
Log:
gcc/
PR target/51244
* config/sh/sh_treg_combine.cc (is_conditional_insn): New function.
(cbranch_trace): Add member rtx* condition_rtx_in_insn, initialize it
accordingly in constructor.
(cbranch_trace::branch_condition_rtx_ref): New function.
(cbranch_trace::branch_condition_rtx): Use branch_condition_rtx_ref.
(sh_treg_combine::try_invert_branch_condition): Invert condition rtx
in insn using reversed_comparison_code and validate_change instead of
invert_jump_1.
(sh_treg_combine::execute): Look for conditional insns in basic blocks
in addition to conditional branches.
* config/sh/sh.md (*movsicc_div0s): Remove combine patterns.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md
trunk/gcc/config/sh/sh_treg_combine.cc


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2014-12-17 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #81 from Oleg Endo  ---
Author: olegendo
Date: Wed Dec 17 22:52:21 2014
New Revision: 218847

URL: https://gcc.gnu.org/viewcvs?rev=218847&root=gcc&view=rev
Log:
gcc/
PR target/51244
* config/sh/sh_treg_combine.cc (sh_treg_combine::try_optimize_cbranch):
Combine ccreg inversion and cbranch into inverted cbranch.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh_treg_combine.cc


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2014-11-30 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #80 from Oleg Endo  ---
Author: olegendo
Date: Mon Dec  1 06:50:06 2014
New Revision: 218200

URL: https://gcc.gnu.org/viewcvs?rev=218200&root=gcc&view=rev
Log:
gcc/
PR target/63986
PR target/51244
* config/sh/sh.c (sh_unspec_insn_p,
sh_insn_operands_modified_between_p): New functions.
(sh_split_movrt_negc_to_movt_xor): Do not delete insn if its operands
are modified or if it has side effects, may trap or is volatile.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2014-11-22 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #79 from Oleg Endo  ---
Author: olegendo
Date: Sat Nov 22 16:07:25 2014
New Revision: 217970

URL: https://gcc.gnu.org/viewcvs?rev=217970&root=gcc&view=rev
Log:
gcc/
Backport from mainline
2014-11-22  Oleg Endo  

PR target/63783
PR target/51244
* config/sh/sh_treg_combine.cc (sh_treg_combine::make_not_reg_insn):
Do not emit bitwise not insn.  Emit logical not insn sequence instead.
Adjust related comments throughout the file.

gcc/testsuite/
Backport from mainline
2014-11-22  Oleg Endo  

PR target/63783
PR target/51244
* gcc.target/sh/torture/pr63783-1.c: New.
* gcc.target/sh/torture/pr63783-2.c: New.
* gcc.target/sh/pr51244-20.c: Adjust.
* gcc.target/sh/pr51244-20-sh2a.c: Adjust.

Added:
branches/gcc-4_9-branch/gcc/testsuite/gcc.target/sh/torture/pr63783-1.c
branches/gcc-4_9-branch/gcc/testsuite/gcc.target/sh/torture/pr63783-2.c
Modified:
branches/gcc-4_9-branch/gcc/ChangeLog
branches/gcc-4_9-branch/gcc/config/sh/sh_treg_combine.cc
branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
branches/gcc-4_9-branch/gcc/testsuite/gcc.target/sh/pr51244-20-sh2a.c
branches/gcc-4_9-branch/gcc/testsuite/gcc.target/sh/pr51244-20.c


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2014-11-22 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #78 from Oleg Endo  ---
Author: olegendo
Date: Sat Nov 22 15:50:10 2014
New Revision: 217969

URL: https://gcc.gnu.org/viewcvs?rev=217969&root=gcc&view=rev
Log:
gcc/
PR target/63783
PR target/51244
* config/sh/sh_treg_combine.cc (sh_treg_combine::make_not_reg_insn):
Do not emit bitwise not insn.  Emit logical not insn sequence instead.
Adjust related comments throughout the file.

gcc/testsuite/
PR target/63783
PR target/51244
* gcc.target/sh/torture/pr63783-1.c: New.
* gcc.target/sh/torture/pr63783-2.c: New.
* gcc.target/sh/pr51244-20.c: Adjust.
* gcc.target/sh/pr51244-20-sh2a.c: Adjust.

Added:
trunk/gcc/testsuite/gcc.target/sh/torture/pr63783-1.c
trunk/gcc/testsuite/gcc.target/sh/torture/pr63783-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh_treg_combine.cc
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/sh/pr51244-20-sh2a.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-20.c


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2014-11-22 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #77 from Oleg Endo  ---
Author: olegendo
Date: Sat Nov 22 15:06:34 2014
New Revision: 217968

URL: https://gcc.gnu.org/viewcvs?rev=217968&root=gcc&view=rev
Log:
gcc/
PR target/63986
PR target/51244
* config/sh/sh.c (sh_is_logical_t_store_expr,
sh_try_omit_signzero_extend): Use rtx_insn* for insn argument.
(sh_split_movrt_negc_to_movt_xor): New function.
(sh_find_set_of_reg): Move to ...
* config/sh/sh-protos.h (sh_find_set_of_reg): ... here and convert
to template function.
(set_of_reg): Use rtx_insn* for insn member.
(sh_is_logical_t_store_expr, sh_try_omit_signzero_extend): Use
rtx_insn* for insn argument.
* config/sh/sh.md (movrt_negc, *movrt_negc): Split into movt-xor
sequence using new sh_split_movrt_negc_to_movt_xor function.
(movrt_xor): Allow also for SH2A.
(*movt_movrt): Delete insns and splits.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh-protos.h
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2014-09-13 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #76 from Oleg Endo  ---
When compiling the libgcc divsc3 from PR 55212 with "-O2 -m2 -ml" (on sh-lra
branch) the following sequences are generated:

tst r0,r0
subcr0,r0 ! r0: T == 0 -> 0x, T == 1 -> 0x
not r0,r0 ! r0: T == 0 -> 0x, T == 1 -> 0x
and #1,r0 ! r0: T == 0 -> 1, T == 1 -> 0

which can be done better as:

tst r0,r0
mov #-1,r0
negcr0,r0

or
tst r0,r0
movtr0
xor #1,r0

and on SH2A:

tst r0,r0
movrt   r0


combine is looking for the following patterns:

Failed to match this instruction:
(set (reg:SI 296 [ D.1371 ])
(and:SI (not:SI (reg:SI 147 t))
(const_int 1 [0x1])))

Failed to match this instruction:
(set (reg:SI 147 t)
(and:SI (reg:SI 147 t)
(const_int 1 [0x1])))

(and:SI (reg:SI T_REG) (const_int 1)) is effectively a T -> T nop move which is
supposed to be handled by the "*movtt" insn.  Maybe the case above and the
original eq:SI case in "*movtt" should be added to the t_reg_operand predicate.
 Then the "*movtt" pattern could be simplified to:

(define_insn_and_split "*movtt"
  [(set (reg:SI T_REG) (match_operand 0 "t_reg_operand"))] ...


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2014-05-16 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #75 from Oleg Endo  ---
Author: olegendo
Date: Fri May 16 22:54:32 2014
New Revision: 210535

URL: http://gcc.gnu.org/viewcvs?rev=210535&root=gcc&view=rev
Log:
gcc/
PR target/51244
* config/sh/sh.c (sh_eval_treg_value): Handle t_reg_operand and
negt_reg_operand cases.
* config/sh/sh.md (*cset_zero): Likewise by using cbranch_treg_value
predicate.
* config/sh/predicates.md (cbranch_treg_value): Simplify.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/predicates.md
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-12-06 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #73 from Oleg Endo  ---
Author: olegendo
Date: Fri Dec  6 10:46:53 2013
New Revision: 205734

URL: http://gcc.gnu.org/viewcvs?rev=205734&root=gcc&view=rev
Log:
PR target/51244
PR target/59343
* config/sh/sh.md (*cbranch_t): Check that there are no labels between
the s1 insn and the testing insn.  Remove REG_DEAD notefrom s1 insn.

PR target/51244
PR target/59343
* gcc.target/sh/pr51244-19.c: Adjust test case.


Modified:
branches/gcc-4_8-branch/gcc/ChangeLog
branches/gcc-4_8-branch/gcc/config/sh/sh.md
branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
branches/gcc-4_8-branch/gcc/testsuite/gcc.target/sh/pr51244-19.c


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-12-05 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #72 from Oleg Endo  ---
The original test case in PR 59343 is an interesting one with regard to T bit
optimizations (or the lack thereof):

void validate_number (char **numbertext)
{
  char *ptr = *numbertext;
  int valid = (ptr != 0) && (*ptr);

  for ( ; valid && *ptr; ++ptr)
valid = (*ptr >= '0');

  if (!valid)
*numbertext = 0;
}

with -Os -m4 -mb it is compiled to:

_validate_number:
mov.l   @r4,r2// [bb 2]
tst r2,r2
bt/s.L2
mov #0,r1


mov.b   @r2,r1// [bb 3]
tst r1,r1
mov #-1,r1
negcr1,r1

.L2:  // [bb 4]
mov #47,r3

.L3:  // [bb 5]
tst r1,r1
bt  .L4

mov.b   @r2+,r1   // [bb 6]
tst r1,r1
bt/s.L8

cmp/gt  r3,r1 // [bb 7]

bra .L3
movtr1

.L4:
mov.l   r1,@r4   // [bb 8]
.L8:
rts
nop


The basic block starting with L3 (bb 5) has three different r1 inputs from [bb
2], [bb 3] and [bb 7].  When sh_treg_combine tries to trace r1 starting in [bb
5]:

tracing (reg/v:SI 1 r1 [orig:185 valid ] [185])

[bb 5]
set of reg not found.  empty BB?

[bb 4]
set of reg not found (cstore)
set not found - aborting trace

Instead it should skip [bb 4] as it doesn't modify r1 or T bit and check [bb 3]
and [bb 2].  Because the setcc insns are not the same in [bb 2], [bb 3] and [bb
7], it would try to eliminate the cstores.  However, in [bb 2] there is no real
cstore but a constant load, which can be replaced with a clrt or sett insn
respectively.  The resulting code could be something like:

mov.l   @r4,r2
mov #0,r1
tst r2,r2
bt/s.L2 // (*)
clrt

mov.b   @r2,r1
tst r1,r1
movtr1
tst r1,r1// T = !T
.L2:
mov #47,r3
.L3:
bf  .L4

mov.b   @r2+,r1
tst r1,r1
bt/s.L8
bra .L3
cmp/gt  r3,r1
.L4:
mov.l   r1,@r4
.L8:
rts
nop

(*) The clrt insn actually has to be inserted before the conditional branch,
which is impossible as it modifies the branch condition.  Putting it into the
delay slot however is OK, which is usually done by the DBR pass.  A special
"branch and set/clear T" pseudo insn would be required (requires SH2+) which
produces the sequence above.  A more complicated way would be to create new
basic blocks.

The basic block reordering or similar RTL pass and the clrt/sett optimization
pass should then be able to simplify the code further to:

mov.l   @r4,r2
tst r2,r2
bf/s.L4
mov #0,r1

mov.b   @r2,r1
tst r1,r1
bt/s.L4
mov #47,r3
.L3:
mov.b   @r2+,r1
tst r1,r1
bt/s.L8
cmp/gt  r3,r1
bt  .L3
.L4:
mov.l   r1,@r4
.L8:
rts
nop


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-10-12 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #71 from Oleg Endo  ---
(In reply to Oleg Endo from comment #70)
> Author: olegendo
> Date: Sat Oct 12 20:47:22 2013
> New Revision: 203492
> 

The issue raised in comment #59 has been fixed on 4.9.
There are some open issues though, which I will try to address in follow up
patches:

* The helper functions in sh_treg_combine.cc should go into a separate .h + .cc
file.  This would allow re-using them in other places and eliminate the similar
function 'sh_find_set_of_reg' in sh.c

* The RTL pass does the treg combine only when there is a conditional branch. 
It should also handle conditional move insns (-mpretend-cmove).

* The function 'try_combine_comparisons' in sh_reg_combine.cc always introduces
reg-reg copies.  In some cases (DImode comparisons in particular), these
reg-reg moves don't get eliminated afterwards before register allocation.  The
function should check whether creating new pseudos can be avoided by re-using
existing regs.


The sh_treg_combine RTL pass could probably be backported to 4.8 but seems too
intrusive.  Instead something like the patch in comment #64 should do, where
instead of checking for 'no_labels_between_p' it would probably be better to
check if the basic block with the conditional branch has only one predecessor.


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-10-12 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #70 from Oleg Endo  ---
Author: olegendo
Date: Sat Oct 12 20:47:22 2013
New Revision: 203492

URL: http://gcc.gnu.org/viewcvs?rev=203492&root=gcc&view=rev
Log:
PR target/51244
* config/sh/sh_treg_combine.cc: New SH specific RTL pass.
* config.gcc (SH extra_objs): Add sh_ifcvt.o.
* config/sh/t-sh (sh_treg_combine.o): New entry.
* config/sh/sh.c (sh_fixed_condition_code_regs): New function that
implements the target hook TARGET_FIXED_CONDITION_CODE_REGS.
(register_sh_passes): New function.  Register sh_treg_combine pass.
(sh_option_override): Invoke it.
(sh_canonicalize_comparison): Handle op0_preserve_value.
* sh.md (*cbranch_t"): Do not try to optimize missed test and branch
opportunities.  Canonicalize branch condition.
(nott): Allow only if pseudos can be created for non-SH2A.

PR target/51244
* gcc.dg/torture/p51244-21.c: New.
* gcc.target/sh/pr51244-20.c: New.
* gcc.target/sh/pr51244-20-sh2a.c: New.


Added:
trunk/gcc/config/sh/sh_treg_combine.cc
trunk/gcc/testsuite/gcc.dg/torture/pr51244-21.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-20-sh2a.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-20.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config.gcc
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/config/sh/t-sh
trunk/gcc/testsuite/ChangeLog


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-10-03 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

Oleg Endo  changed:

   What|Removed |Added

  Attachment #30889|0   |1
is obsolete||

--- Comment #69 from Oleg Endo  ---
Created attachment 30953
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30953&action=edit
RTL pass

(In reply to Oleg Endo from comment #68)
> Created attachment 30889 [details]
> RTL pass
> 
> An updated patch that adds an SH specific RTL pass against current trunk
> (rev 202873), not fully tested.
> 
> CSiBE for '-m2a-single -O2' and '-m4-single -mpretend-cmove -O2' look OK. 
> There are only 2 cases that got actually worse in the set:
> 
> 
> linux-2.4.23-pre3-testplatform/net/ipv4/igmp.s (add_grec):
> 
> before:
> .L459:
>   bt  .L294
>   mov.l   @(24,r13),r1
>   tst r1,r1
>   bt/s.L295
>   add #64,r1
>   mov r13,r2
>   add #64,r2
>   mov.l   @(36,r1),r1
>   mov.l   @(32,r2),r2
>   sub r2,r1
>   mov #11,r2
>   cmp/hs  r1,r2
> .L296:
>   bf/s.L294
>   mov r13,r4
>   mov.l   .L408,r0
>   jsr @r0
>   mov #0,r13
> 
> after:
> .L459:
>   bt  .L294
>   mov.l   @(24,r13),r1
>   tst r1,r1
>   bt  .L295
>   add #64,r1
>   mov r13,r2
>   add #64,r2
>   mov.l   @(36,r1),r1
>   mov.l   @(32,r2),r2
>   sub r2,r1
>   mov #11,r2
>   cmp/hs  r1,r2
>   movtr1
> .L296:
>   tst r1,r1
>   bt/s.L294
>   mov r13,r4
>   mov.l   .L408,r0
>   jsr @r0
>   mov #0,r13


That case didn't get worse, it actually improved.  The 'before' code is wrong
code, due to a missed BB that sets the tested 'r1' reg to '1'.

Testing the previous version of the RTL pass (attachment 30889) against trunk
rev 202876 revealed a defect in the function 'trace_reg_uses'.  The attached
updated version fixes this.

[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-09-24 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

Oleg Endo  changed:

   What|Removed |Added

  Attachment #30689|0   |1
is obsolete||

--- Comment #68 from Oleg Endo  ---
Created attachment 30889
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30889&action=edit
RTL pass

An updated patch that adds an SH specific RTL pass against current trunk (rev
202873), not fully tested.

CSiBE for '-m2a-single -O2' and '-m4-single -mpretend-cmove -O2' look OK. 
There are only 2 cases that got actually worse in the set:

linux-2.4.23-pre3-testplatform/fs/lockd/host.s (nlm_lookup_host):

before:
.L142:
bt.L60
mov.l@(20,r11),r6
cmp/eqr6,r10
bf.L58
addr1,r13

after:
.L142:
bt.L60
mov.l@(20,r11),r6
movr10,r5
cmp/eqr6,r5
bf.L58
addr1,r13


linux-2.4.23-pre3-testplatform/net/ipv4/igmp.s (add_grec):

before:
.L459:
bt.L294
mov.l@(24,r13),r1
tstr1,r1
bt/s.L295
add#64,r1
movr13,r2
add#64,r2
mov.l@(36,r1),r1
mov.l@(32,r2),r2
subr2,r1
mov#11,r2
cmp/hsr1,r2
.L296:
bf/s.L294
movr13,r4
mov.l.L408,r0
jsr@r0
mov#0,r13

after:
.L459:
bt.L294
mov.l@(24,r13),r1
tstr1,r1
bt.L295
add#64,r1
movr13,r2
add#64,r2
mov.l@(36,r1),r1
mov.l@(32,r2),r2
subr2,r1
mov#11,r2
cmp/hsr1,r2
movtr1
.L296:
tstr1,r1
bt/s.L294
movr13,r4
mov.l.L408,r0
jsr@r0
mov#0,r13


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-08-22 Thread kkojima at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #67 from Kazumoto Kojima  ---
(In reply to Oleg Endo from comment #66)
> Kaz, the "WIP status" aside, would you be OK with something like that?

Yep.  Sounds good to me.


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-08-22 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #66 from Oleg Endo  ---
Created attachment 30689
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30689&action=edit
WIP RTL pass

Just wanted to give an update on the issue.

I've been writing an SH specific RTL pass that handles those multiple BB cases
as a replacement for the splitter in *cbranch_t pattern.
Basically it tries to combine comparisons and T bit cstores before cbranches
across multiple blocks.

There are still quite some open issues and some copy pasta to be folded, but
the pass can already eliminate the test cases mentioned before.  Moreover, it
also optimizes DImode comparisons and can utilize SH2A's nott instruction
better.  In order to get good results, the pass has to be run twice.

I've developed this against rev. 201282 so it also needs some adaptation for
the new passes stuff that's been done recently on trunk.

Kaz, the "WIP status" aside, would you be OK with something like that?


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-07-31 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #65 from Oleg Endo  ---
(In reply to Oleg Endo from comment #64)
> 
> would be simplified to this:
> 
> mov.l   @(4,r4),r1
> tst r1,r1   // T = @(4,r4) == 0
> .L3:
> bt/s.L5
> mov #1,r1
> cmp/hi  r1,r5
> bf/s.L9
> mov #0,r0
> rts
> nop
> .L2:
> mov.l   @r4,r1
> bra .L3
> tst r1,r1   // T = @(r4) == 0

Sorry, I got confused.  The above is wrong.  One of the T bit inversions can't
be eliminated in this case.
It should be:

mov.l   @(4,r4),r1
.L3:
tst r1,r1
bt/s.L5
mov #1,r1
cmp/hi  r1,r5
bf/s.L9
mov #0,r0
rts
nop
.L2:
mov.l   @r4,r1
tst r1,r1
bra .L3
movtr1


Or SH2A:
mov.l   @(4,r4),r1
tst r1,r1
.L3:
bt/s.L5
mov #1,r1
cmp/hi  r1,r5
bf/s.L9
mov #0,r0
rts
nop
.L2:
mov.l   @r4,r1
tst r1,r1
bra .L3
nott

However, my original 'optimized' asm snippet is valid if the reduced test case
is changed to:

static inline int
blk_oversized_queue (int* q)
{
  if (q[2])
return q[1] == 0;   // instead of != 0
  return q[0] == 0;
}

The current trunk version eliminates the movt/tst insns and produces correct
code by accident.  It can be simplified even more:

mov.l   @(4,r4),r1
.L3:
tst r1,r1
bt/s.L5
mov #1,r1
cmp/hi  r1,r5
bf/s.L9
mov #0,r0
rts
nop
.L2:
bra .L3
mov.l   @r4,r1

I'm trying to come up with a patch that implements t bit tracing in order to
handle those scenarios.


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-07-28 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #64 from Oleg Endo  ---
(In reply to Laurent Aflonsi from comment #61)
> 
> The movt(L2) and the tst(L3) are both removed, and that's coherent for that
> run path, because it is preceded by the tst r2,r2.
> But that makes the first path incoherent because L3 can be reached by the
> very first block. I have written a first fix, too restrictive ("pr25869-19.c
> scan-assembler-not movt" is failing) :
> 
> --- ./gcc/gcc/config/sh/sh.md.orig
> +++ ./gcc/gcc/config/sh/sh.md
> @@ -8523,7 +8523,8 @@
>T bit.  Notice that some T bit stores such as negc also modify
>the T bit.  */
> if (modified_between_p (get_t_reg_rtx (), s1.insn, testing_insn)
> -   || modified_in_p (get_t_reg_rtx (), s1.insn))
> +   || modified_in_p (get_t_reg_rtx (), s1.insn)
> +   || !no_labels_between_p(s1.insn, testing_insn))
>   operands[2] = NULL_RTX;
>  
> break;
> 
> The idea would be to check if "s1.insn block dominates testing_insn block",
> but I don't know how to write it at this stage.

The proper way would be to find all basic blocks that set the tested reg.  With
the reduced test case, just right before the split1 pass there are two basic
blocks that set reg 167 which is then tested for '== 0' before the conditional
branch:

(note 13 12 14 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
<...>
(insn 15 14 16 3 (set (reg:SI 147 t)
(eq:SI (reg:SI 173 [ MEM[(int *)q_3(D) + 4B] ])
(const_int 0 [0]))) sh_tmp.cpp:84 17 {cmpeqsi_t}
 (expr_list:REG_DEAD (reg:SI 173 [ MEM[(int *)q_3(D) + 4B] ])
(nil)))

(insn 16 15 17 3 (set (reg:SI 175)
(const_int -1 [0x])) sh_tmp.cpp:84 250 {movsi_ie}
 (nil))
(note 17 16 18 3 NOTE_INSN_DELETED)
(insn 18 17 71 3 (parallel [
(set (reg:SI 167 [ D.1424 ])
(xor:SI (reg:SI 147 t)
(const_int 1 [0x1])))
(set (reg:SI 147 t)
(const_int 1 [0x1]))
(use (reg:SI 175))
]) sh_tmp.cpp:84 394 {movrt_negc}
 (expr_list:REG_DEAD (reg:SI 175)
(expr_list:REG_UNUSED (reg:SI 147 t)
(nil
(jump_insn 71 18 72 3 (set (pc)
(label_ref 27)) -1
 (nil)
 -> 27)
(barrier 72 71 21)


(code_label 21 72 22 4 2 "" [1 uses])
(note 22 21 23 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
<...>
(insn 24 23 26 4 (set (reg:SI 147 t)
(eq:SI (reg:SI 177 [ *q_3(D) ])
(const_int 0 [0]))) sh_tmp.cpp:85 17 {cmpeqsi_t}
 (expr_list:REG_DEAD (reg:SI 177 [ *q_3(D) ])
(nil)))
(insn 26 24 27 4 (set (reg:SI 167 [ D.1424 ])
(reg:SI 147 t)) sh_tmp.cpp:85 392 {movt}
 (expr_list:REG_DEAD (reg:SI 147 t)
(nil)))



(code_label 27 26 28 5 3 "" [1 uses])
(note 28 27 29 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(insn 29 28 30 5 (set (reg:SI 147 t)
(eq:SI (reg:SI 167 [ D.1424 ])
(const_int 0 [0]))) sh_tmp.cpp:91 17 {cmpeqsi_t}
 (expr_list:REG_DEAD (reg:SI 167 [ D.1424 ])
(nil)))
(jump_insn 30 29 31 5 (set (pc)
(if_then_else (ne (reg:SI 147 t)
(const_int 0 [0]))
(label_ref:SI 50)
(pc))) sh_tmp.cpp:91 295 {*cbranch_t}
 (expr_list:REG_DEAD (reg:SI 147 t)
(expr_list:REG_BR_PROB (const_int 400 [0x190])
(nil)))
 -> 50)


Here it starts walking up the insns from insn 29 [bb 5] and finds insn 26 [bb
4], but it should also check [bb 3].
The question then is, what to do with the collected basic blocks.  Ideally it
should look at all the T bit paths in every basic block and try to eliminate
redundant T bit flipping in each basic block so that in this case [bb 5] can
start with the conditional branch.

Then this ...
mov.l   @(4,r4),r1
tst r1,r1   // T = @(4,r4) == 0
mov #-1,r1
negcr1,r1   // r1 = @(4,r4) != 0
.L3:
tst r1,r1   // T = @(4,r4) == 0
bt/s.L5
mov #1,r1
cmp/hi  r1,r5
bf/s.L9
mov #0,r0
rts
nop
.L2:
mov.l   @r4,r1
tst r1,r1   // T = @(r4) == 0
bra .L3
movtr1  // r1 = @(r4) == 0


would be simplified to this:

mov.l   @(4,r4),r1
tst r1,r1   // T = @(4,r4) == 0
.L3:
bt/s.L5
mov #1,r1
cmp/hi  r1,r5
bf/s.L9
mov #0,r0
rts
nop
.L2:
mov.l   @r4,r1
bra .L3
tst r1,r1   // T = @(r4) == 0


Maybe if BImode was used for the T bit, combine could do better at folding T
bit flipping.  However, it would not do cross BB analysis, so I think it's
pointless to try out BImode.
I'm not sure whether there is already something in the compiler that could do
this kind of optimization.  According to my observations it should happen after
the combine pass and before register allocation to get useful results.

Until then I think the following should be applied to 4.9 and 4.

[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-07-28 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #63 from Oleg Endo  ---
Created attachment 30566
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30566&action=edit
Reduced test

(In reply to Laurent Aflonsi from comment #58)
> Created attachment 30524 [details]
> functional regression

This is a stripped down test case.


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-07-27 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #62 from Oleg Endo  ---
(In reply to Laurent Aflonsi from comment #61)
> 
> More generally, I'm surprised to see that optimization at mapping level,
> isn't this a generic problematic that should be handled at rtl dead code
> elimination stage on the T bit register ?

Actually, it is a kind of generic case.  Dead code elimination would not do
these kind of logic folding.  Usually this kind of stuff handled by the combine
pass which can figure out some redundant operations or operations that cancel
each other out.  However, combine's logic is also limited and it the overall T
bit handling is a bit shaky.  That's why I introduced the additional
elimination handling that is done in the split pass after the combine pass on
insns that combine didn't catch.  I didn't want to introduce another rtl pass
just for this and touching the combine pass also didn't seem attractive since
all the other backends depend on its behavior.

Maybe it would be better to switch T_REG from SImode to BImode, which reflects
reality.  This should be relatively straight forward to do.

Another idea would be to try out using CCmode.  There some additional
optimizations done on CCmode.  However, this is a bigger change.


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-07-23 Thread laurent.alfonsi at st dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #61 from Laurent Aflonsi  ---
Yes that's the point. L3 can be reached by another block (L2):

tstr2,r2
mov#-1,r2
negcr2,r2
.L3:
tstr2,r2
bt/s.L11
[...]
.L2:
mov.l@r4,r2
tstr2,r2
bra.L3
movtr2

The movt(L2) and the tst(L3) are both removed, and that's coherent for that run
path, because it is preceded by the tst r2,r2.
But that makes the first path incoherent because L3 can be reached by the very
first block. I have written a first fix, too restrictive ("pr25869-19.c
scan-assembler-not movt" is failing) :

--- ./gcc/gcc/config/sh/sh.md.orig
+++ ./gcc/gcc/config/sh/sh.md
@@ -8523,7 +8523,8 @@
   T bit.  Notice that some T bit stores such as negc also modify
   the T bit.  */
if (modified_between_p (get_t_reg_rtx (), s1.insn, testing_insn)
-   || modified_in_p (get_t_reg_rtx (), s1.insn))
+   || modified_in_p (get_t_reg_rtx (), s1.insn)
+   || !no_labels_between_p(s1.insn, testing_insn))
  operands[2] = NULL_RTX;

break;

The idea would be to check if "s1.insn block dominates testing_insn block",
but I don't know how to write it at this stage.

More generally, I'm surprised to see that optimization at mapping level, isn't
this a generic problematic that should be handled at rtl dead code elimination
stage on the T bit register ?

Thanks,
Laurent


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-07-20 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #60 from Oleg Endo  ---
(In reply to Laurent Aflonsi from comment #59)
> I have a functional regression due to this improvement when we are compiling
> the enclosed example in -O2.
>  $ sh-superh-elf-gcc -O2 pr51244-20-main.c pr51244-20.c
>  $ sh-superh-elf-run a.out
>  FAIL
> 
> Thus, the code is transformed from :
>   _get_request:
>   mov.l   @(12,r4),r1
>   tst r1,r1
>   bt  .L2
>   mov.l   @(4,r4),r2
>   tst r2,r2
>   mov #-1,r2
>   negcr2,r2
>   .L3:
>   tst r2,r2
>   bt/s.L11
>   mov #-100,r0
> mov   #1,r2
> [...]
> 
> to : 
>   _get_request:
>   mov.l   @(12,r4),r1
>   tst r1,r1
>   bt  .L2
>   mov.l   @(4,r4),r2
>   tst r2,r2
>   mov #-1,r2
>   negcr2,r2
>   .L3:
>   bf/s.L11
>   mov #-100,r0
> mov   #1,r2
> [...]
> 
> With the inputs encoded in the main function, we are supposed to follow the
> simpliest flow (no jump), but when this optimization is enabled, we are
> jumping to L11 to to the bt -> bf transfrmation.

The idea was that sequences such as
  tst r2,r2
  mov #-1,r2
  negc r2,r2
  tst r2,r2
  bt  ...

should be folded to
  tst r2,r2
  bt  ...

... if r2 is dead afterwards (which it seems to be).  I guess I missed to
handle some cases where the tested register is in a loop or can be reached by
some other basic block.  I'll check out the details.

[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-07-18 Thread laurent.alfonsi at st dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #59 from Laurent Aflonsi  ---
I have a functional regression due to this improvement when we are compiling
the enclosed example in -O2.
 $ sh-superh-elf-gcc -O2 pr51244-20-main.c pr51244-20.c
 $ sh-superh-elf-run a.out
 FAIL

Thus, the code is transformed from :
  _get_request:
mov.l@(12,r4),r1
tstr1,r1
bt.L2
mov.l@(4,r4),r2
tstr2,r2
mov#-1,r2
 negcr2,r2
  .L3:
tstr2,r2
bt/s.L11
mov#-100,r0
mov#1,r2
[...]

to : 
  _get_request:
mov.l@(12,r4),r1
tstr1,r1
bt.L2
mov.l@(4,r4),r2
tstr2,r2
mov#-1,r2
negcr2,r2
  .L3:
bf/s.L11
mov#-100,r0
mov#1,r2
[...]

With the inputs encoded in the main function, we are supposed to follow the
simpliest flow (no jump), but when this optimization is enabled, we are jumping
to L11 to to the bt -> bf transfrmation.

Could you please look at it ?

Thanks
Laurent


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2013-07-18 Thread laurent.alfonsi at st dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

Laurent Aflonsi  changed:

   What|Removed |Added

 CC||laurent.alfonsi at st dot com

--- Comment #58 from Laurent Aflonsi  ---
Created attachment 30524
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30524&action=edit
functional regression


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2012-11-03 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244



--- Comment #57 from Oleg Endo  2012-11-03 
12:01:05 UTC ---

Author: olegendo

Date: Sat Nov  3 12:01:01 2012

New Revision: 193119



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193119

Log:

PR target/51244

* config/sh/sh.md (*cbranch_t): Allow splitting after reload.

Allow going beyond current basic block before reload when looking for

the reg set insn.

* config/sh/sh.c (sh_find_set_of_reg): Don't stop at labels.



PR target/51244

* gcc.target/sh/pr51244-18.c: New.

* gcc.target/sh/pr51244-19.c: New.





Added:

trunk/gcc/testsuite/gcc.target/sh/pr51244-18.c

trunk/gcc/testsuite/gcc.target/sh/pr51244-19.c

Modified:

trunk/gcc/ChangeLog

trunk/gcc/config/sh/sh.c

trunk/gcc/config/sh/sh.md

trunk/gcc/testsuite/ChangeLog


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2012-10-15 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244



--- Comment #56 from Oleg Endo  2012-10-15 
22:08:14 UTC ---

Author: olegendo

Date: Mon Oct 15 22:08:07 2012

New Revision: 192481



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192481

Log:

PR target/51244

* config/sh/sh-protos.h (set_of_reg): New struct.

(sh_find_set_of_reg, sh_is_logical_t_store_expr,

sh_try_omit_signzero_extend):  Declare...

* config/sh/sh.c (sh_find_set_of_reg, sh_is_logical_t_store_expr,

sh_try_omit_signzero_extend): ...these new functions.

* config/sh/sh.md (*logical_op_t): New insn_and_split.

(*zero_extendsi2_compact): Use sh_try_omit_signzero_extend

in splitter.

(*extendsi2_compact_reg): Convert to insn_and_split.

Use sh_try_omit_signzero_extend in splitter.

(*mov_reg_reg): Disallow t_reg_operand as operand 1.

(*cbranch_t): Rewrite combine part in splitter using new

sh_find_set_of_reg function.



PR target/51244

* gcc.target/sh/pr51244-17.c: New.





Added:

trunk/gcc/testsuite/gcc.target/sh/pr51244-17.c

Modified:

trunk/gcc/ChangeLog

trunk/gcc/config/sh/sh-protos.h

trunk/gcc/config/sh/sh.c

trunk/gcc/config/sh/sh.md

trunk/gcc/testsuite/ChangeLog


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2012-10-11 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244



--- Comment #55 from Oleg Endo  2012-10-12 
00:41:31 UTC ---

Author: olegendo

Date: Fri Oct 12 00:41:23 2012

New Revision: 192387



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192387

Log:

PR target/51244

* config/sh/sh.md (negsi_cond, negdi_cond, stack_protect_test): Remove

get_t_reg_rtx when invoking gen_branch_true or gen_branch_false.

(*zero_extendsi2_compact): Convert to insn_and_split.  Convert

zero extensions of T bit stores to reg moves in splitter.  Remove

obsolete unnamed peephole2 that caught zero extensions after negc T bit

stores.

(*branch_true_eq, *branch_false_ne): Delete.

(branch_true, branch_false): Convert insn to expander.  Move actual

insn logic to...

(*cbranch_t): ...this new insn_and_split.  Try to find preceding

redundant T bit stores and tests and combine them with the conditional

branch if possible in the splitter.

(movrt_xor, *movt_movrt): New insn_and_split.

* config/sh/predicates.md (cbranch_treg_value): New predicate.

* config/sh/sh-protos.h (sh_eval_treg_value): Forward declare...

* config/sh/sh.c (sh_eval_treg_value): ...this new function.

(expand_cbranchsi4, expand_cbranchdi4): Remove get_t_reg_rtx

when invoking gen_branch_true or gen_branch_false.



PR target/51244

* gcc.target/sh/pr51244-13.c: New.

* gcc.target/sh/pr51244-14.c: New.

* gcc.target/sh/pr51244-15.c: New.

* gcc.target/sh/pr51244-16.c: New.





Added:

trunk/gcc/testsuite/gcc.target/sh/pr51244-13.c

trunk/gcc/testsuite/gcc.target/sh/pr51244-14.c

trunk/gcc/testsuite/gcc.target/sh/pr51244-15.c

trunk/gcc/testsuite/gcc.target/sh/pr51244-16.c

Modified:

trunk/gcc/ChangeLog

trunk/gcc/config/sh/predicates.md

trunk/gcc/config/sh/sh-protos.h

trunk/gcc/config/sh/sh.c

trunk/gcc/config/sh/sh.md

trunk/gcc/testsuite/ChangeLog


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2012-10-03 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244



--- Comment #54 from Oleg Endo  2012-10-03 
21:39:22 UTC ---

Author: olegendo

Date: Wed Oct  3 21:39:18 2012

New Revision: 192052



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192052

Log:

PR target/51244

* config/sh/sh.md (*mov_t_msb_neg): New insn and two accompanying

unnamed split patterns.



PR target/51244

* gcc.target/sh/pr51244-12.c: New.





Added:

trunk/gcc/testsuite/gcc.target/sh/pr51244-12.c

Modified:

trunk/gcc/ChangeLog

trunk/gcc/config/sh/sh.md

trunk/gcc/testsuite/ChangeLog


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2012-09-23 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244



--- Comment #53 from Oleg Endo  2012-09-23 
21:41:55 UTC ---

Another case that seems to go awry:



int test_1 (int a, int b, int c, int* d)

{

  bool x = a == 0;

  d[2] = !x;



  return x ? b : c;

}



-O2 -m4:

tst r4,r4

mov #1,r1

movtr0

xor r0,r1

tst r0,r0

bt/s.L5

mov.l   r1,@(8,r7)

mov r5,r6

.L5:

rts

mov r6,r0



This should be something like:

tst r4,r4

movtr0

xor #1,r0

bf/s.L5

mov.l   r1,@(8,r7)

mov r5,r6

.L5:

rts

mov r6,r0





-O2 -m2a:

tst r4,r4

movtr0

mov #1,r1

xor r0,r1

mov.l   r1,@(8,r7)

tst r0,r0

bf  .L6

mov r6,r0

rts/n

.align 1

.L6:

rts

mov r5,r0



This should be:

tst r4,r4

movrt   r1

mov.l   r1,@(8,r7)

bt  .L6

mov r6,r0

rts/n

.align 1

.L6:

rts

mov r5,r0


[Bug target/51244] [SH] Inefficient conditional branch and code around T bit

2012-09-23 Thread olegendo at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244



Oleg Endo  changed:



   What|Removed |Added



Summary|SH Target: Inefficient  |[SH] Inefficient

   |conditional branch  |conditional branch and code

   ||around T bit



--- Comment #52 from Oleg Endo  2012-09-04 
08:03:08 UTC ---

Author: olegendo

Date: Tue Sep  4 08:03:01 2012

New Revision: 190909



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190909

Log:

PR target/51244

* config/sh/sh.c (prepare_cbranch_operands): Pull out comparison

canonicalization code into...

* (sh_canonicalize_comparison): This new function.

* config/sh/sh-protos.h: Declare it.

* config/sh/sh.h: Use it in new macro CANONICALIZE_COMPARISON.

* config/sh/sh.md (cbranchsi4): Remove TARGET_CBRANCHDI4 check and

always invoke expand_cbranchsi4.





Modified:

trunk/gcc/ChangeLog

trunk/gcc/config/sh/sh-protos.h

trunk/gcc/config/sh/sh.c

trunk/gcc/config/sh/sh.h

trunk/gcc/config/sh/sh.md