[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #88 from Oleg Endo --- Author: olegendo Date: Tue Sep 27 12:50:27 2016 New Revision: 240533 URL: https://gcc.gnu.org/viewcvs?rev=240533&root=gcc&view=rev Log: gcc/ PR target/51244 * config/sh/sh.c (sh_rtx_costs): Fix return value of SET of movt and movrt patterns. Match them before anything else in the SET case. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #87 from Oleg Endo --- Author: olegendo Date: Sun Sep 25 06:59:37 2016 New Revision: 240471 URL: https://gcc.gnu.org/viewcvs?rev=240471&root=gcc&view=rev Log: This fixes a fallout that actually goes back to 5.0 but went unnoticed. The costs for movt and movrt type of insns were not correctly reported and ifcvt thus made some bad choices for SH. A new cset_zero pattern variant is also required to fix the matching for some recent changes in the middle end. gcc/ PR target/51244 * config/sh/sh.c (sh_movt_set_dest, sh_movrt_set_dest): Add overloads. (sh_rtx_costs): Handle SET of movt and movrt patterns. * cnofig/sh/sh-protos.h (sh_movt_set_dest, sh_movrt_set_dest): Forward declare new overloads. * config/sh/sh.md (*cset_zero): Add variant that takes a treg_set_expr operand. gcc/testsuite/ PR target/51244 * gcc.target/sh/pr51244-11.c: Add more detailed expected insn matching. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr51244-11.c
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 Oleg Endo changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #86 from Oleg Endo --- I'd like to close this PR as fixed because it's getting too long. I'll try to pull out the remaining issues into individual new PRs.
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #85 from Oleg Endo --- Author: olegendo Date: Sat Jan 24 13:04:53 2015 New Revision: 220081 URL: https://gcc.gnu.org/viewcvs?rev=220081&root=gcc&view=rev Log: gcc/ PR target/49263 PR target/53987 PR target/64345 PR target/59533 PR target/52933 PR target/54236 PR target/51244 * config/sh/sh-protos.h (sh_extending_set_of_reg::can_use_as_unextended_reg, sh_extending_set_of_reg::use_as_unextended_reg, sh_is_nott_insn, sh_movt_set_dest, sh_movrt_set_dest, sh_is_movt_insn, sh_is_movrt_insn, sh_insn_operands_modified_between_p, sh_reg_dead_or_unused_after_insn, sh_in_recog_treg_set_expr, sh_recog_treg_set_expr, sh_split_treg_set_expr): New functions. (sh_treg_insns): New class. * config/sh/sh.c (TARGET_LEGITIMATE_COMBINED_INSN): Define target hook. (scope_counter): New class. (sh_legitimate_combined_insn, sh_is_nott_insn, sh_movt_set_dest, sh_movrt_set_dest, sh_reg_dead_or_unused_after_insn, sh_extending_set_of_reg::can_use_as_unextended_reg, sh_extending_set_of_reg::use_as_unextended_reg, sh_recog_treg_set_expr, sh_in_recog_treg_set_expr, sh_try_split_insn_simple, sh_split_treg_set_expr): New functions. (addsubcosts): Handle treg_set_expr. (sh_rtx_costs): Handle IF_THEN_ELSE and ZERO_EXTRACT. (sh_rtx_costs): Use arith_reg_operand in SIGN_EXTEND and ZERO_EXTEND. (sh_rtx_costs): Handle additional bit test patterns in EQ and AND cases. (sh_insn_operands_modified_between_p): Make non-static. * config/sh/predicates.md (zero_extend_movu_operand): Allow simple_mem_operand in addition to displacement_mem_operand. (zero_extend_operand): Don't allow zero_extend_movu_operand. (treg_set_expr, treg_set_expr_not_const01, arith_reg_or_treg_set_expr): New predicates. * config/sh/sh.md (tstsi_t): Use arith_reg_operand and arith_or_int_operand instead of logical_operand. Convert to insn_and_split. Try to optimize constant operand in splitter. (tsthi_t, tstqi_t): Fold into *tst_t. Convert to insn_and_split. (*tstqi_t_zero): Delete. (*tst_t_subregs): Add !sh_in_recog_treg_set_expr split condition. (tstsi_t_and_not): Delete. (tst_t_zero_extract_eq): Rename to *tst_t_zero_extract. Convert to insn_and_split. (unnamed split, tstsi_t_zero_extract_xor, tstsi_t_zero_extract_subreg_xor_little, tstsi_t_zero_extract_subreg_xor_big): Delete. (*tstsi_t_shift_mask): New insn_and_split. (cmpeqsi_t, cmpgesi_t): Add new split for const_int 0 operands and try to recombine with surrounding insns when splitting. (*negtstsi): Add !sh_in_recog_treg_set_expr condition. (cmp_div0s_0, cmp_div0s_1, *cmp_div0s_0, *cmp_div0s_1): Rewrite as ... (cmp_div0s, *cmp_div0s_1, *cmp_div0s_2, *cmp_div0s_3, *cmp_div0s_4, *cmp_div0s_5, *cmp_div0s_6): ... these new insn_and_split patterns. (*cbranch_div0s: Delete. (*addc): Convert to insn_and_split. Use treg_set_expr as 3rd operand. Try to recombine with surrounding insns when splitting. Add operand order variants. (*addc_t_r, *addc_r_t): Use treg_set_expr_not_const01. (*addc_r_r_1, *addc_r_lsb, *addc_r_r_lsb, *addc_r_lsb_r, *addc_r_msb, *addc_r_r_msb, *addc_2r_msb): Delete. (*addc_2r_lsb): Rename to *addc_2r_t. Use treg_set_expr. Add operand order variant. (*addc_negreg_t): New insn_and_split. (*subc): Convert to insn_and_split. Use treg_set_expr as 3rd operand. Try to recombine with surrounding insns when splitting. Add operand order variants. (*subc_negt_reg, *subc_negreg_t, *reg_lsb_t, *reg_msb_t): New insn_and_split patterns. (*rotcr): Use arith_reg_or_treg_set_expr. Try to recombine with surrounding insns when splitting. (unnamed rotcr split): Use arith_reg_or_treg_set_expr. (*rotcl): Likewise. Add zero_extract variant. (*ashrsi2_31): New insn_and_split. (*negc): Convert to insn_and_split. Use treg_set_expr. (*zero_extendsi2_disp_mem): Update comment. (movrt_negc, *movrt_negc, nott): Add !sh_in_recog_treg_set_expr split condition. (*mov_t_msb_neg, mov_neg_si_t): Use treg_set_expr. Try to recombine with surrounding insns when splitting. (any_treg_expr_to_reg): New insn_and_split. (*neg_zero_extract_0, *neg_zero_extract_1, *neg_zero_extract_2, *neg_zero_extract_3, *neg_zero_extract_4, *neg_zero_extract_5, *neg_zero_extract_6, *zero_extract_0, *zero_extract_1, *zero_extract_2): New single bit zero extract patterns. (bld_reg, *bld_regqi): Fold into bld_reg. (*get_thread_pointersi, store_gbr, *mov_gbr_load, *mov_gbr_load, *mov_gbr_load, *mov_gbr_load, *movdi_gbr_load): Use arith_reg_dest instead of register_operand for set destination. (set_thread_pointersi, load_gbr): Use arith_reg_operand instead of register_operand for set source. gcc/testsuite/ PR target/49263 PR target/53987
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #84 from Oleg Endo --- Author: olegendo Date: Wed Dec 24 21:55:59 2014 New Revision: 219062 URL: https://gcc.gnu.org/viewcvs?rev=219062&root=gcc&view=rev Log: gcc/ PR target/51244 * config/sh/sh.md (*mov_t_msb_neg): Convert split into insn_and_split. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #83 from Oleg Endo --- (In reply to Oleg Endo from comment #71) > > * The RTL pass does the treg combine only when there is a conditional > branch. It should also handle conditional move insns (-mpretend-cmove). > It does now. It also handles nott cbranch sequences by inverting the branch condition and deleting the nott insn.
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #82 from Oleg Endo --- Author: olegendo Date: Wed Dec 17 23:08:14 2014 New Revision: 218850 URL: https://gcc.gnu.org/viewcvs?rev=218850&root=gcc&view=rev Log: gcc/ PR target/51244 * config/sh/sh_treg_combine.cc (is_conditional_insn): New function. (cbranch_trace): Add member rtx* condition_rtx_in_insn, initialize it accordingly in constructor. (cbranch_trace::branch_condition_rtx_ref): New function. (cbranch_trace::branch_condition_rtx): Use branch_condition_rtx_ref. (sh_treg_combine::try_invert_branch_condition): Invert condition rtx in insn using reversed_comparison_code and validate_change instead of invert_jump_1. (sh_treg_combine::execute): Look for conditional insns in basic blocks in addition to conditional branches. * config/sh/sh.md (*movsicc_div0s): Remove combine patterns. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/config/sh/sh_treg_combine.cc
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #81 from Oleg Endo --- Author: olegendo Date: Wed Dec 17 22:52:21 2014 New Revision: 218847 URL: https://gcc.gnu.org/viewcvs?rev=218847&root=gcc&view=rev Log: gcc/ PR target/51244 * config/sh/sh_treg_combine.cc (sh_treg_combine::try_optimize_cbranch): Combine ccreg inversion and cbranch into inverted cbranch. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh_treg_combine.cc
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #80 from Oleg Endo --- Author: olegendo Date: Mon Dec 1 06:50:06 2014 New Revision: 218200 URL: https://gcc.gnu.org/viewcvs?rev=218200&root=gcc&view=rev Log: gcc/ PR target/63986 PR target/51244 * config/sh/sh.c (sh_unspec_insn_p, sh_insn_operands_modified_between_p): New functions. (sh_split_movrt_negc_to_movt_xor): Do not delete insn if its operands are modified or if it has side effects, may trap or is volatile. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #79 from Oleg Endo --- Author: olegendo Date: Sat Nov 22 16:07:25 2014 New Revision: 217970 URL: https://gcc.gnu.org/viewcvs?rev=217970&root=gcc&view=rev Log: gcc/ Backport from mainline 2014-11-22 Oleg Endo PR target/63783 PR target/51244 * config/sh/sh_treg_combine.cc (sh_treg_combine::make_not_reg_insn): Do not emit bitwise not insn. Emit logical not insn sequence instead. Adjust related comments throughout the file. gcc/testsuite/ Backport from mainline 2014-11-22 Oleg Endo PR target/63783 PR target/51244 * gcc.target/sh/torture/pr63783-1.c: New. * gcc.target/sh/torture/pr63783-2.c: New. * gcc.target/sh/pr51244-20.c: Adjust. * gcc.target/sh/pr51244-20-sh2a.c: Adjust. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.target/sh/torture/pr63783-1.c branches/gcc-4_9-branch/gcc/testsuite/gcc.target/sh/torture/pr63783-2.c Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/config/sh/sh_treg_combine.cc branches/gcc-4_9-branch/gcc/testsuite/ChangeLog branches/gcc-4_9-branch/gcc/testsuite/gcc.target/sh/pr51244-20-sh2a.c branches/gcc-4_9-branch/gcc/testsuite/gcc.target/sh/pr51244-20.c
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #78 from Oleg Endo --- Author: olegendo Date: Sat Nov 22 15:50:10 2014 New Revision: 217969 URL: https://gcc.gnu.org/viewcvs?rev=217969&root=gcc&view=rev Log: gcc/ PR target/63783 PR target/51244 * config/sh/sh_treg_combine.cc (sh_treg_combine::make_not_reg_insn): Do not emit bitwise not insn. Emit logical not insn sequence instead. Adjust related comments throughout the file. gcc/testsuite/ PR target/63783 PR target/51244 * gcc.target/sh/torture/pr63783-1.c: New. * gcc.target/sh/torture/pr63783-2.c: New. * gcc.target/sh/pr51244-20.c: Adjust. * gcc.target/sh/pr51244-20-sh2a.c: Adjust. Added: trunk/gcc/testsuite/gcc.target/sh/torture/pr63783-1.c trunk/gcc/testsuite/gcc.target/sh/torture/pr63783-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh_treg_combine.cc trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr51244-20-sh2a.c trunk/gcc/testsuite/gcc.target/sh/pr51244-20.c
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #77 from Oleg Endo --- Author: olegendo Date: Sat Nov 22 15:06:34 2014 New Revision: 217968 URL: https://gcc.gnu.org/viewcvs?rev=217968&root=gcc&view=rev Log: gcc/ PR target/63986 PR target/51244 * config/sh/sh.c (sh_is_logical_t_store_expr, sh_try_omit_signzero_extend): Use rtx_insn* for insn argument. (sh_split_movrt_negc_to_movt_xor): New function. (sh_find_set_of_reg): Move to ... * config/sh/sh-protos.h (sh_find_set_of_reg): ... here and convert to template function. (set_of_reg): Use rtx_insn* for insn member. (sh_is_logical_t_store_expr, sh_try_omit_signzero_extend): Use rtx_insn* for insn argument. * config/sh/sh.md (movrt_negc, *movrt_negc): Split into movt-xor sequence using new sh_split_movrt_negc_to_movt_xor function. (movrt_xor): Allow also for SH2A. (*movt_movrt): Delete insns and splits. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #76 from Oleg Endo --- When compiling the libgcc divsc3 from PR 55212 with "-O2 -m2 -ml" (on sh-lra branch) the following sequences are generated: tst r0,r0 subcr0,r0 ! r0: T == 0 -> 0x, T == 1 -> 0x not r0,r0 ! r0: T == 0 -> 0x, T == 1 -> 0x and #1,r0 ! r0: T == 0 -> 1, T == 1 -> 0 which can be done better as: tst r0,r0 mov #-1,r0 negcr0,r0 or tst r0,r0 movtr0 xor #1,r0 and on SH2A: tst r0,r0 movrt r0 combine is looking for the following patterns: Failed to match this instruction: (set (reg:SI 296 [ D.1371 ]) (and:SI (not:SI (reg:SI 147 t)) (const_int 1 [0x1]))) Failed to match this instruction: (set (reg:SI 147 t) (and:SI (reg:SI 147 t) (const_int 1 [0x1]))) (and:SI (reg:SI T_REG) (const_int 1)) is effectively a T -> T nop move which is supposed to be handled by the "*movtt" insn. Maybe the case above and the original eq:SI case in "*movtt" should be added to the t_reg_operand predicate. Then the "*movtt" pattern could be simplified to: (define_insn_and_split "*movtt" [(set (reg:SI T_REG) (match_operand 0 "t_reg_operand"))] ...
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #75 from Oleg Endo --- Author: olegendo Date: Fri May 16 22:54:32 2014 New Revision: 210535 URL: http://gcc.gnu.org/viewcvs?rev=210535&root=gcc&view=rev Log: gcc/ PR target/51244 * config/sh/sh.c (sh_eval_treg_value): Handle t_reg_operand and negt_reg_operand cases. * config/sh/sh.md (*cset_zero): Likewise by using cbranch_treg_value predicate. * config/sh/predicates.md (cbranch_treg_value): Simplify. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #73 from Oleg Endo --- Author: olegendo Date: Fri Dec 6 10:46:53 2013 New Revision: 205734 URL: http://gcc.gnu.org/viewcvs?rev=205734&root=gcc&view=rev Log: PR target/51244 PR target/59343 * config/sh/sh.md (*cbranch_t): Check that there are no labels between the s1 insn and the testing insn. Remove REG_DEAD notefrom s1 insn. PR target/51244 PR target/59343 * gcc.target/sh/pr51244-19.c: Adjust test case. Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/config/sh/sh.md branches/gcc-4_8-branch/gcc/testsuite/ChangeLog branches/gcc-4_8-branch/gcc/testsuite/gcc.target/sh/pr51244-19.c
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #72 from Oleg Endo --- The original test case in PR 59343 is an interesting one with regard to T bit optimizations (or the lack thereof): void validate_number (char **numbertext) { char *ptr = *numbertext; int valid = (ptr != 0) && (*ptr); for ( ; valid && *ptr; ++ptr) valid = (*ptr >= '0'); if (!valid) *numbertext = 0; } with -Os -m4 -mb it is compiled to: _validate_number: mov.l @r4,r2// [bb 2] tst r2,r2 bt/s.L2 mov #0,r1 mov.b @r2,r1// [bb 3] tst r1,r1 mov #-1,r1 negcr1,r1 .L2: // [bb 4] mov #47,r3 .L3: // [bb 5] tst r1,r1 bt .L4 mov.b @r2+,r1 // [bb 6] tst r1,r1 bt/s.L8 cmp/gt r3,r1 // [bb 7] bra .L3 movtr1 .L4: mov.l r1,@r4 // [bb 8] .L8: rts nop The basic block starting with L3 (bb 5) has three different r1 inputs from [bb 2], [bb 3] and [bb 7]. When sh_treg_combine tries to trace r1 starting in [bb 5]: tracing (reg/v:SI 1 r1 [orig:185 valid ] [185]) [bb 5] set of reg not found. empty BB? [bb 4] set of reg not found (cstore) set not found - aborting trace Instead it should skip [bb 4] as it doesn't modify r1 or T bit and check [bb 3] and [bb 2]. Because the setcc insns are not the same in [bb 2], [bb 3] and [bb 7], it would try to eliminate the cstores. However, in [bb 2] there is no real cstore but a constant load, which can be replaced with a clrt or sett insn respectively. The resulting code could be something like: mov.l @r4,r2 mov #0,r1 tst r2,r2 bt/s.L2 // (*) clrt mov.b @r2,r1 tst r1,r1 movtr1 tst r1,r1// T = !T .L2: mov #47,r3 .L3: bf .L4 mov.b @r2+,r1 tst r1,r1 bt/s.L8 bra .L3 cmp/gt r3,r1 .L4: mov.l r1,@r4 .L8: rts nop (*) The clrt insn actually has to be inserted before the conditional branch, which is impossible as it modifies the branch condition. Putting it into the delay slot however is OK, which is usually done by the DBR pass. A special "branch and set/clear T" pseudo insn would be required (requires SH2+) which produces the sequence above. A more complicated way would be to create new basic blocks. The basic block reordering or similar RTL pass and the clrt/sett optimization pass should then be able to simplify the code further to: mov.l @r4,r2 tst r2,r2 bf/s.L4 mov #0,r1 mov.b @r2,r1 tst r1,r1 bt/s.L4 mov #47,r3 .L3: mov.b @r2+,r1 tst r1,r1 bt/s.L8 cmp/gt r3,r1 bt .L3 .L4: mov.l r1,@r4 .L8: rts nop
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #71 from Oleg Endo --- (In reply to Oleg Endo from comment #70) > Author: olegendo > Date: Sat Oct 12 20:47:22 2013 > New Revision: 203492 > The issue raised in comment #59 has been fixed on 4.9. There are some open issues though, which I will try to address in follow up patches: * The helper functions in sh_treg_combine.cc should go into a separate .h + .cc file. This would allow re-using them in other places and eliminate the similar function 'sh_find_set_of_reg' in sh.c * The RTL pass does the treg combine only when there is a conditional branch. It should also handle conditional move insns (-mpretend-cmove). * The function 'try_combine_comparisons' in sh_reg_combine.cc always introduces reg-reg copies. In some cases (DImode comparisons in particular), these reg-reg moves don't get eliminated afterwards before register allocation. The function should check whether creating new pseudos can be avoided by re-using existing regs. The sh_treg_combine RTL pass could probably be backported to 4.8 but seems too intrusive. Instead something like the patch in comment #64 should do, where instead of checking for 'no_labels_between_p' it would probably be better to check if the basic block with the conditional branch has only one predecessor.
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #70 from Oleg Endo --- Author: olegendo Date: Sat Oct 12 20:47:22 2013 New Revision: 203492 URL: http://gcc.gnu.org/viewcvs?rev=203492&root=gcc&view=rev Log: PR target/51244 * config/sh/sh_treg_combine.cc: New SH specific RTL pass. * config.gcc (SH extra_objs): Add sh_ifcvt.o. * config/sh/t-sh (sh_treg_combine.o): New entry. * config/sh/sh.c (sh_fixed_condition_code_regs): New function that implements the target hook TARGET_FIXED_CONDITION_CODE_REGS. (register_sh_passes): New function. Register sh_treg_combine pass. (sh_option_override): Invoke it. (sh_canonicalize_comparison): Handle op0_preserve_value. * sh.md (*cbranch_t"): Do not try to optimize missed test and branch opportunities. Canonicalize branch condition. (nott): Allow only if pseudos can be created for non-SH2A. PR target/51244 * gcc.dg/torture/p51244-21.c: New. * gcc.target/sh/pr51244-20.c: New. * gcc.target/sh/pr51244-20-sh2a.c: New. Added: trunk/gcc/config/sh/sh_treg_combine.cc trunk/gcc/testsuite/gcc.dg/torture/pr51244-21.c trunk/gcc/testsuite/gcc.target/sh/pr51244-20-sh2a.c trunk/gcc/testsuite/gcc.target/sh/pr51244-20.c Modified: trunk/gcc/ChangeLog trunk/gcc/config.gcc trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/config/sh/t-sh trunk/gcc/testsuite/ChangeLog
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 Oleg Endo changed: What|Removed |Added Attachment #30889|0 |1 is obsolete|| --- Comment #69 from Oleg Endo --- Created attachment 30953 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30953&action=edit RTL pass (In reply to Oleg Endo from comment #68) > Created attachment 30889 [details] > RTL pass > > An updated patch that adds an SH specific RTL pass against current trunk > (rev 202873), not fully tested. > > CSiBE for '-m2a-single -O2' and '-m4-single -mpretend-cmove -O2' look OK. > There are only 2 cases that got actually worse in the set: > > > linux-2.4.23-pre3-testplatform/net/ipv4/igmp.s (add_grec): > > before: > .L459: > bt .L294 > mov.l @(24,r13),r1 > tst r1,r1 > bt/s.L295 > add #64,r1 > mov r13,r2 > add #64,r2 > mov.l @(36,r1),r1 > mov.l @(32,r2),r2 > sub r2,r1 > mov #11,r2 > cmp/hs r1,r2 > .L296: > bf/s.L294 > mov r13,r4 > mov.l .L408,r0 > jsr @r0 > mov #0,r13 > > after: > .L459: > bt .L294 > mov.l @(24,r13),r1 > tst r1,r1 > bt .L295 > add #64,r1 > mov r13,r2 > add #64,r2 > mov.l @(36,r1),r1 > mov.l @(32,r2),r2 > sub r2,r1 > mov #11,r2 > cmp/hs r1,r2 > movtr1 > .L296: > tst r1,r1 > bt/s.L294 > mov r13,r4 > mov.l .L408,r0 > jsr @r0 > mov #0,r13 That case didn't get worse, it actually improved. The 'before' code is wrong code, due to a missed BB that sets the tested 'r1' reg to '1'. Testing the previous version of the RTL pass (attachment 30889) against trunk rev 202876 revealed a defect in the function 'trace_reg_uses'. The attached updated version fixes this.
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 Oleg Endo changed: What|Removed |Added Attachment #30689|0 |1 is obsolete|| --- Comment #68 from Oleg Endo --- Created attachment 30889 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30889&action=edit RTL pass An updated patch that adds an SH specific RTL pass against current trunk (rev 202873), not fully tested. CSiBE for '-m2a-single -O2' and '-m4-single -mpretend-cmove -O2' look OK. There are only 2 cases that got actually worse in the set: linux-2.4.23-pre3-testplatform/fs/lockd/host.s (nlm_lookup_host): before: .L142: bt.L60 mov.l@(20,r11),r6 cmp/eqr6,r10 bf.L58 addr1,r13 after: .L142: bt.L60 mov.l@(20,r11),r6 movr10,r5 cmp/eqr6,r5 bf.L58 addr1,r13 linux-2.4.23-pre3-testplatform/net/ipv4/igmp.s (add_grec): before: .L459: bt.L294 mov.l@(24,r13),r1 tstr1,r1 bt/s.L295 add#64,r1 movr13,r2 add#64,r2 mov.l@(36,r1),r1 mov.l@(32,r2),r2 subr2,r1 mov#11,r2 cmp/hsr1,r2 .L296: bf/s.L294 movr13,r4 mov.l.L408,r0 jsr@r0 mov#0,r13 after: .L459: bt.L294 mov.l@(24,r13),r1 tstr1,r1 bt.L295 add#64,r1 movr13,r2 add#64,r2 mov.l@(36,r1),r1 mov.l@(32,r2),r2 subr2,r1 mov#11,r2 cmp/hsr1,r2 movtr1 .L296: tstr1,r1 bt/s.L294 movr13,r4 mov.l.L408,r0 jsr@r0 mov#0,r13
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #67 from Kazumoto Kojima --- (In reply to Oleg Endo from comment #66) > Kaz, the "WIP status" aside, would you be OK with something like that? Yep. Sounds good to me.
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #66 from Oleg Endo --- Created attachment 30689 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30689&action=edit WIP RTL pass Just wanted to give an update on the issue. I've been writing an SH specific RTL pass that handles those multiple BB cases as a replacement for the splitter in *cbranch_t pattern. Basically it tries to combine comparisons and T bit cstores before cbranches across multiple blocks. There are still quite some open issues and some copy pasta to be folded, but the pass can already eliminate the test cases mentioned before. Moreover, it also optimizes DImode comparisons and can utilize SH2A's nott instruction better. In order to get good results, the pass has to be run twice. I've developed this against rev. 201282 so it also needs some adaptation for the new passes stuff that's been done recently on trunk. Kaz, the "WIP status" aside, would you be OK with something like that?
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #65 from Oleg Endo --- (In reply to Oleg Endo from comment #64) > > would be simplified to this: > > mov.l @(4,r4),r1 > tst r1,r1 // T = @(4,r4) == 0 > .L3: > bt/s.L5 > mov #1,r1 > cmp/hi r1,r5 > bf/s.L9 > mov #0,r0 > rts > nop > .L2: > mov.l @r4,r1 > bra .L3 > tst r1,r1 // T = @(r4) == 0 Sorry, I got confused. The above is wrong. One of the T bit inversions can't be eliminated in this case. It should be: mov.l @(4,r4),r1 .L3: tst r1,r1 bt/s.L5 mov #1,r1 cmp/hi r1,r5 bf/s.L9 mov #0,r0 rts nop .L2: mov.l @r4,r1 tst r1,r1 bra .L3 movtr1 Or SH2A: mov.l @(4,r4),r1 tst r1,r1 .L3: bt/s.L5 mov #1,r1 cmp/hi r1,r5 bf/s.L9 mov #0,r0 rts nop .L2: mov.l @r4,r1 tst r1,r1 bra .L3 nott However, my original 'optimized' asm snippet is valid if the reduced test case is changed to: static inline int blk_oversized_queue (int* q) { if (q[2]) return q[1] == 0; // instead of != 0 return q[0] == 0; } The current trunk version eliminates the movt/tst insns and produces correct code by accident. It can be simplified even more: mov.l @(4,r4),r1 .L3: tst r1,r1 bt/s.L5 mov #1,r1 cmp/hi r1,r5 bf/s.L9 mov #0,r0 rts nop .L2: bra .L3 mov.l @r4,r1 I'm trying to come up with a patch that implements t bit tracing in order to handle those scenarios.
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #64 from Oleg Endo --- (In reply to Laurent Aflonsi from comment #61) > > The movt(L2) and the tst(L3) are both removed, and that's coherent for that > run path, because it is preceded by the tst r2,r2. > But that makes the first path incoherent because L3 can be reached by the > very first block. I have written a first fix, too restrictive ("pr25869-19.c > scan-assembler-not movt" is failing) : > > --- ./gcc/gcc/config/sh/sh.md.orig > +++ ./gcc/gcc/config/sh/sh.md > @@ -8523,7 +8523,8 @@ >T bit. Notice that some T bit stores such as negc also modify >the T bit. */ > if (modified_between_p (get_t_reg_rtx (), s1.insn, testing_insn) > - || modified_in_p (get_t_reg_rtx (), s1.insn)) > + || modified_in_p (get_t_reg_rtx (), s1.insn) > + || !no_labels_between_p(s1.insn, testing_insn)) > operands[2] = NULL_RTX; > > break; > > The idea would be to check if "s1.insn block dominates testing_insn block", > but I don't know how to write it at this stage. The proper way would be to find all basic blocks that set the tested reg. With the reduced test case, just right before the split1 pass there are two basic blocks that set reg 167 which is then tested for '== 0' before the conditional branch: (note 13 12 14 3 [bb 3] NOTE_INSN_BASIC_BLOCK) <...> (insn 15 14 16 3 (set (reg:SI 147 t) (eq:SI (reg:SI 173 [ MEM[(int *)q_3(D) + 4B] ]) (const_int 0 [0]))) sh_tmp.cpp:84 17 {cmpeqsi_t} (expr_list:REG_DEAD (reg:SI 173 [ MEM[(int *)q_3(D) + 4B] ]) (nil))) (insn 16 15 17 3 (set (reg:SI 175) (const_int -1 [0x])) sh_tmp.cpp:84 250 {movsi_ie} (nil)) (note 17 16 18 3 NOTE_INSN_DELETED) (insn 18 17 71 3 (parallel [ (set (reg:SI 167 [ D.1424 ]) (xor:SI (reg:SI 147 t) (const_int 1 [0x1]))) (set (reg:SI 147 t) (const_int 1 [0x1])) (use (reg:SI 175)) ]) sh_tmp.cpp:84 394 {movrt_negc} (expr_list:REG_DEAD (reg:SI 175) (expr_list:REG_UNUSED (reg:SI 147 t) (nil (jump_insn 71 18 72 3 (set (pc) (label_ref 27)) -1 (nil) -> 27) (barrier 72 71 21) (code_label 21 72 22 4 2 "" [1 uses]) (note 22 21 23 4 [bb 4] NOTE_INSN_BASIC_BLOCK) <...> (insn 24 23 26 4 (set (reg:SI 147 t) (eq:SI (reg:SI 177 [ *q_3(D) ]) (const_int 0 [0]))) sh_tmp.cpp:85 17 {cmpeqsi_t} (expr_list:REG_DEAD (reg:SI 177 [ *q_3(D) ]) (nil))) (insn 26 24 27 4 (set (reg:SI 167 [ D.1424 ]) (reg:SI 147 t)) sh_tmp.cpp:85 392 {movt} (expr_list:REG_DEAD (reg:SI 147 t) (nil))) (code_label 27 26 28 5 3 "" [1 uses]) (note 28 27 29 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn 29 28 30 5 (set (reg:SI 147 t) (eq:SI (reg:SI 167 [ D.1424 ]) (const_int 0 [0]))) sh_tmp.cpp:91 17 {cmpeqsi_t} (expr_list:REG_DEAD (reg:SI 167 [ D.1424 ]) (nil))) (jump_insn 30 29 31 5 (set (pc) (if_then_else (ne (reg:SI 147 t) (const_int 0 [0])) (label_ref:SI 50) (pc))) sh_tmp.cpp:91 295 {*cbranch_t} (expr_list:REG_DEAD (reg:SI 147 t) (expr_list:REG_BR_PROB (const_int 400 [0x190]) (nil))) -> 50) Here it starts walking up the insns from insn 29 [bb 5] and finds insn 26 [bb 4], but it should also check [bb 3]. The question then is, what to do with the collected basic blocks. Ideally it should look at all the T bit paths in every basic block and try to eliminate redundant T bit flipping in each basic block so that in this case [bb 5] can start with the conditional branch. Then this ... mov.l @(4,r4),r1 tst r1,r1 // T = @(4,r4) == 0 mov #-1,r1 negcr1,r1 // r1 = @(4,r4) != 0 .L3: tst r1,r1 // T = @(4,r4) == 0 bt/s.L5 mov #1,r1 cmp/hi r1,r5 bf/s.L9 mov #0,r0 rts nop .L2: mov.l @r4,r1 tst r1,r1 // T = @(r4) == 0 bra .L3 movtr1 // r1 = @(r4) == 0 would be simplified to this: mov.l @(4,r4),r1 tst r1,r1 // T = @(4,r4) == 0 .L3: bt/s.L5 mov #1,r1 cmp/hi r1,r5 bf/s.L9 mov #0,r0 rts nop .L2: mov.l @r4,r1 bra .L3 tst r1,r1 // T = @(r4) == 0 Maybe if BImode was used for the T bit, combine could do better at folding T bit flipping. However, it would not do cross BB analysis, so I think it's pointless to try out BImode. I'm not sure whether there is already something in the compiler that could do this kind of optimization. According to my observations it should happen after the combine pass and before register allocation to get useful results. Until then I think the following should be applied to 4.9 and 4.
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #63 from Oleg Endo --- Created attachment 30566 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30566&action=edit Reduced test (In reply to Laurent Aflonsi from comment #58) > Created attachment 30524 [details] > functional regression This is a stripped down test case.
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #62 from Oleg Endo --- (In reply to Laurent Aflonsi from comment #61) > > More generally, I'm surprised to see that optimization at mapping level, > isn't this a generic problematic that should be handled at rtl dead code > elimination stage on the T bit register ? Actually, it is a kind of generic case. Dead code elimination would not do these kind of logic folding. Usually this kind of stuff handled by the combine pass which can figure out some redundant operations or operations that cancel each other out. However, combine's logic is also limited and it the overall T bit handling is a bit shaky. That's why I introduced the additional elimination handling that is done in the split pass after the combine pass on insns that combine didn't catch. I didn't want to introduce another rtl pass just for this and touching the combine pass also didn't seem attractive since all the other backends depend on its behavior. Maybe it would be better to switch T_REG from SImode to BImode, which reflects reality. This should be relatively straight forward to do. Another idea would be to try out using CCmode. There some additional optimizations done on CCmode. However, this is a bigger change.
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #61 from Laurent Aflonsi --- Yes that's the point. L3 can be reached by another block (L2): tstr2,r2 mov#-1,r2 negcr2,r2 .L3: tstr2,r2 bt/s.L11 [...] .L2: mov.l@r4,r2 tstr2,r2 bra.L3 movtr2 The movt(L2) and the tst(L3) are both removed, and that's coherent for that run path, because it is preceded by the tst r2,r2. But that makes the first path incoherent because L3 can be reached by the very first block. I have written a first fix, too restrictive ("pr25869-19.c scan-assembler-not movt" is failing) : --- ./gcc/gcc/config/sh/sh.md.orig +++ ./gcc/gcc/config/sh/sh.md @@ -8523,7 +8523,8 @@ T bit. Notice that some T bit stores such as negc also modify the T bit. */ if (modified_between_p (get_t_reg_rtx (), s1.insn, testing_insn) - || modified_in_p (get_t_reg_rtx (), s1.insn)) + || modified_in_p (get_t_reg_rtx (), s1.insn) + || !no_labels_between_p(s1.insn, testing_insn)) operands[2] = NULL_RTX; break; The idea would be to check if "s1.insn block dominates testing_insn block", but I don't know how to write it at this stage. More generally, I'm surprised to see that optimization at mapping level, isn't this a generic problematic that should be handled at rtl dead code elimination stage on the T bit register ? Thanks, Laurent
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #60 from Oleg Endo --- (In reply to Laurent Aflonsi from comment #59) > I have a functional regression due to this improvement when we are compiling > the enclosed example in -O2. > $ sh-superh-elf-gcc -O2 pr51244-20-main.c pr51244-20.c > $ sh-superh-elf-run a.out > FAIL > > Thus, the code is transformed from : > _get_request: > mov.l @(12,r4),r1 > tst r1,r1 > bt .L2 > mov.l @(4,r4),r2 > tst r2,r2 > mov #-1,r2 > negcr2,r2 > .L3: > tst r2,r2 > bt/s.L11 > mov #-100,r0 > mov #1,r2 > [...] > > to : > _get_request: > mov.l @(12,r4),r1 > tst r1,r1 > bt .L2 > mov.l @(4,r4),r2 > tst r2,r2 > mov #-1,r2 > negcr2,r2 > .L3: > bf/s.L11 > mov #-100,r0 > mov #1,r2 > [...] > > With the inputs encoded in the main function, we are supposed to follow the > simpliest flow (no jump), but when this optimization is enabled, we are > jumping to L11 to to the bt -> bf transfrmation. The idea was that sequences such as tst r2,r2 mov #-1,r2 negc r2,r2 tst r2,r2 bt ... should be folded to tst r2,r2 bt ... ... if r2 is dead afterwards (which it seems to be). I guess I missed to handle some cases where the tested register is in a loop or can be reached by some other basic block. I'll check out the details.
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #59 from Laurent Aflonsi --- I have a functional regression due to this improvement when we are compiling the enclosed example in -O2. $ sh-superh-elf-gcc -O2 pr51244-20-main.c pr51244-20.c $ sh-superh-elf-run a.out FAIL Thus, the code is transformed from : _get_request: mov.l@(12,r4),r1 tstr1,r1 bt.L2 mov.l@(4,r4),r2 tstr2,r2 mov#-1,r2 negcr2,r2 .L3: tstr2,r2 bt/s.L11 mov#-100,r0 mov#1,r2 [...] to : _get_request: mov.l@(12,r4),r1 tstr1,r1 bt.L2 mov.l@(4,r4),r2 tstr2,r2 mov#-1,r2 negcr2,r2 .L3: bf/s.L11 mov#-100,r0 mov#1,r2 [...] With the inputs encoded in the main function, we are supposed to follow the simpliest flow (no jump), but when this optimization is enabled, we are jumping to L11 to to the bt -> bf transfrmation. Could you please look at it ? Thanks Laurent
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 Laurent Aflonsi changed: What|Removed |Added CC||laurent.alfonsi at st dot com --- Comment #58 from Laurent Aflonsi --- Created attachment 30524 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30524&action=edit functional regression
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #57 from Oleg Endo 2012-11-03 12:01:05 UTC --- Author: olegendo Date: Sat Nov 3 12:01:01 2012 New Revision: 193119 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193119 Log: PR target/51244 * config/sh/sh.md (*cbranch_t): Allow splitting after reload. Allow going beyond current basic block before reload when looking for the reg set insn. * config/sh/sh.c (sh_find_set_of_reg): Don't stop at labels. PR target/51244 * gcc.target/sh/pr51244-18.c: New. * gcc.target/sh/pr51244-19.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr51244-18.c trunk/gcc/testsuite/gcc.target/sh/pr51244-19.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #56 from Oleg Endo 2012-10-15 22:08:14 UTC --- Author: olegendo Date: Mon Oct 15 22:08:07 2012 New Revision: 192481 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192481 Log: PR target/51244 * config/sh/sh-protos.h (set_of_reg): New struct. (sh_find_set_of_reg, sh_is_logical_t_store_expr, sh_try_omit_signzero_extend): Declare... * config/sh/sh.c (sh_find_set_of_reg, sh_is_logical_t_store_expr, sh_try_omit_signzero_extend): ...these new functions. * config/sh/sh.md (*logical_op_t): New insn_and_split. (*zero_extendsi2_compact): Use sh_try_omit_signzero_extend in splitter. (*extendsi2_compact_reg): Convert to insn_and_split. Use sh_try_omit_signzero_extend in splitter. (*mov_reg_reg): Disallow t_reg_operand as operand 1. (*cbranch_t): Rewrite combine part in splitter using new sh_find_set_of_reg function. PR target/51244 * gcc.target/sh/pr51244-17.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr51244-17.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #55 from Oleg Endo 2012-10-12 00:41:31 UTC --- Author: olegendo Date: Fri Oct 12 00:41:23 2012 New Revision: 192387 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192387 Log: PR target/51244 * config/sh/sh.md (negsi_cond, negdi_cond, stack_protect_test): Remove get_t_reg_rtx when invoking gen_branch_true or gen_branch_false. (*zero_extendsi2_compact): Convert to insn_and_split. Convert zero extensions of T bit stores to reg moves in splitter. Remove obsolete unnamed peephole2 that caught zero extensions after negc T bit stores. (*branch_true_eq, *branch_false_ne): Delete. (branch_true, branch_false): Convert insn to expander. Move actual insn logic to... (*cbranch_t): ...this new insn_and_split. Try to find preceding redundant T bit stores and tests and combine them with the conditional branch if possible in the splitter. (movrt_xor, *movt_movrt): New insn_and_split. * config/sh/predicates.md (cbranch_treg_value): New predicate. * config/sh/sh-protos.h (sh_eval_treg_value): Forward declare... * config/sh/sh.c (sh_eval_treg_value): ...this new function. (expand_cbranchsi4, expand_cbranchdi4): Remove get_t_reg_rtx when invoking gen_branch_true or gen_branch_false. PR target/51244 * gcc.target/sh/pr51244-13.c: New. * gcc.target/sh/pr51244-14.c: New. * gcc.target/sh/pr51244-15.c: New. * gcc.target/sh/pr51244-16.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr51244-13.c trunk/gcc/testsuite/gcc.target/sh/pr51244-14.c trunk/gcc/testsuite/gcc.target/sh/pr51244-15.c trunk/gcc/testsuite/gcc.target/sh/pr51244-16.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #54 from Oleg Endo 2012-10-03 21:39:22 UTC --- Author: olegendo Date: Wed Oct 3 21:39:18 2012 New Revision: 192052 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192052 Log: PR target/51244 * config/sh/sh.md (*mov_t_msb_neg): New insn and two accompanying unnamed split patterns. PR target/51244 * gcc.target/sh/pr51244-12.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr51244-12.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #53 from Oleg Endo 2012-09-23 21:41:55 UTC --- Another case that seems to go awry: int test_1 (int a, int b, int c, int* d) { bool x = a == 0; d[2] = !x; return x ? b : c; } -O2 -m4: tst r4,r4 mov #1,r1 movtr0 xor r0,r1 tst r0,r0 bt/s.L5 mov.l r1,@(8,r7) mov r5,r6 .L5: rts mov r6,r0 This should be something like: tst r4,r4 movtr0 xor #1,r0 bf/s.L5 mov.l r1,@(8,r7) mov r5,r6 .L5: rts mov r6,r0 -O2 -m2a: tst r4,r4 movtr0 mov #1,r1 xor r0,r1 mov.l r1,@(8,r7) tst r0,r0 bf .L6 mov r6,r0 rts/n .align 1 .L6: rts mov r5,r0 This should be: tst r4,r4 movrt r1 mov.l r1,@(8,r7) bt .L6 mov r6,r0 rts/n .align 1 .L6: rts mov r5,r0
[Bug target/51244] [SH] Inefficient conditional branch and code around T bit
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 Oleg Endo changed: What|Removed |Added Summary|SH Target: Inefficient |[SH] Inefficient |conditional branch |conditional branch and code ||around T bit --- Comment #52 from Oleg Endo 2012-09-04 08:03:08 UTC --- Author: olegendo Date: Tue Sep 4 08:03:01 2012 New Revision: 190909 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190909 Log: PR target/51244 * config/sh/sh.c (prepare_cbranch_operands): Pull out comparison canonicalization code into... * (sh_canonicalize_comparison): This new function. * config/sh/sh-protos.h: Declare it. * config/sh/sh.h: Use it in new macro CANONICALIZE_COMPARISON. * config/sh/sh.md (cbranchsi4): Remove TARGET_CBRANCHDI4 check and always invoke expand_cbranchsi4. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.h trunk/gcc/config/sh/sh.md