[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #52 from Oleg Endo 2012-09-04 08:03:08 UTC --- Author: olegendo Date: Tue Sep 4 08:03:01 2012 New Revision: 190909 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190909 Log: PR target/51244 * config/sh/sh.c (prepare_cbranch_operands): Pull out comparison canonicalization code into... * (sh_canonicalize_comparison): This new function. * config/sh/sh-protos.h: Declare it. * config/sh/sh.h: Use it in new macro CANONICALIZE_COMPARISON. * config/sh/sh.md (cbranchsi4): Remove TARGET_CBRANCHDI4 check and always invoke expand_cbranchsi4. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.h trunk/gcc/config/sh/sh.md
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #51 from Oleg Endo 2012-08-31 15:50:35 UTC --- (In reply to comment #50) Thanks! Hmm .. difficult. There seem to be 17 improvements and 10 dis-improvements, but the dis-improvements seem heavier. The improvement avg is 1.1% and the dis-improvements avg is -2.1%. I don't know .. maybe this should wait a bit more.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #50 from Kazumoto Kojima 2012-08-31 10:54:44 UTC --- (In reply to comment #49) > Kaz, if you have some time, could you please gather some CSiBE runtime numbers > for '-mpretend-cmove' and without it? Here is the runtime result with -O2: test no cmoveratio(%) bzip2-1.0.2 bzip2.d 10.976711.07-0.84312 bzip2-1.0.2 bzip2recover 4.703334.69333 0.213068 bzip2-1.0.2 bzip2.c 43.086743.73-1.47115 compiler vam.fib 2.026672.00667 0.996678 compiler vam.fact1.913331.89333 1.05634 compiler vam.test2 0.256667 0.27 -3.75 Here is the runtime result with -O2: test no cmoveratio(%) bzip2-1.0.2 bzip2.d 10.976711.07-0.84312 bzip2-1.0.2 bzip2recover 4.703334.69333 0.213068 bzip2-1.0.2 bzip2.c 43.086743.73-1.47115 compiler vam.fib 2.026672.00667 0.996678 compiler vam.fact1.913331.89333 1.05634 compiler vam.test2 0.256667 0.27 -3.75 flex-2.5.31 flex 13.18 13.02 1.22888 jikespg-1.3 jikespg 1.616671.6 1.04167 jpeg-6b jpegtran24.65 4.61 0.867679 jpeg-6b djpeg2 2.33 2.28667 1.89504 jpeg-6b djpeg1 2.293332.24667 2.07715 jpeg-6b cjpeg2 3.013332.99667 0.556174 jpeg-6b djpeg0 0.336667 0.35 -3.80952 jpeg-6b cjpeg0 0.476667 0.486667 -2.05479 jpeg-6b cjpeg1 3.063332.99667 2.22469 jpeg-6b jpegtran00.26 0.27 -2.46914 jpeg-6b jpegtran11.91.86667 1.78571 libpng-1.2.5 png2pnm00.986667 0.96 2.42215 libpng-1.2.5 pnm2png144.633345.6333 -2.19138 libpng-1.2.5 pnm2png07.936678.09333 -1.93575 libpng-1.2.5 png2pnm16.73 6.75 -0.296296 teem-1.6.0-src dehex01.67 1.66333 0.400802 teem-1.6.0-src dehex110.96 10.9367 0.21335 teem-1.6.0-src enhex141.176740.5733 1.48702 teem-1.6.0-src enhex06.183336.31 -2.0074 zlib-1.1.4 minigzip0 46.486746.2533 0.504468 zlib-1.1.4 minigzip 5.523335.50333 0.363416 flex-2.5.31 flex 13.18 13.02 1.22888 jikespg-1.3 jikespg 1.616671.6 1.04167 jpeg-6b jpegtran24.65 4.61 0.867679 jpeg-6b djpeg2 2.33 2.28667 1.89504 jpeg-6b djpeg1 2.293332.24667 2.07715 jpeg-6b cjpeg2 3.013332.99667 0.556174 jpeg-6b djpeg0 0.336667 0.35 -3.80952 jpeg-6b cjpeg0 0.476667 0.486667 -2.05479 jpeg-6b cjpeg1 3.063332.99667 2.22469 jpeg-6b jpegtran00.26 0.27 -2.46914 jpeg-6b jpegtran11.91.86667 1.78571 libpng-1.2.5 png2pnm00.986667 0.96 2.42215 libpng-1.2.5 pnm2png144.633345.6333 -2.19138 libpng-1.2.5 pnm2png07.936678.09333 -1.93575 libpng-1.2.5 png2pnm16.73 6.75 -0.296296 teem-1.6.0-src dehex01.67 1.66333 0.400802 teem-1.6.0-src dehex110.96 10.9367 0.21335 teem-1.6.0-src enhex141.176740.5733 1.48702 teem-1.6.0-src enhex06.183336.31 -2.0074 zlib-1.1.4 minigzip0 46.486746.2533 0.504468 zlib-1.1.4 minigzip 5.523335.50333 0.363416
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #49 from Oleg Endo 2012-08-30 22:54:23 UTC --- Kaz, if you have some time, could you please gather some CSiBE runtime numbers for '-mpretend-cmove' and without it? I've compared the result-size of the CSiBE set and with -mpretend-cmove there's a total decrease of 948 bytes, with a few opposite cases. My idea was to obsolete the -mpretend-cmove option, and instead tie its behavior the new option -mzdcbranch, which generally is supposed to control any kind of zero-displacement-branch handling.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #48 from Oleg Endo 2012-08-20 20:51:12 UTC --- Author: olegendo Date: Mon Aug 20 20:51:06 2012 New Revision: 190544 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190544 Log: PR target/51244 * config/sh/sh.md (*cset_zero): New insns. PR target/51244 * gcc.target/sh/pr51244-11.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr51244-11.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #47 from Oleg Endo 2012-08-12 22:47:21 UTC --- Author: olegendo Date: Sun Aug 12 22:47:15 2012 New Revision: 190331 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190331 Log: PR target/51244 * config/sh/sh.md: Add splits for inverted compare and branch opportunities. (*cmpeqsi_t): New insn. (cmpgtsi_t, cmpgesi_t): Swap r and N alternatives. (cmpgeusi_t): Use satisfies_constraint_Z. Emit sett insn in replacement insn list and not in the preparation statements. (clrt, sett): Add mt_group attribute. PR target/51244 * gcc.target/sh/pr51244-7.c: New. * gcc.target/sh/pr51244-8.c: New. * gcc.target/sh/pr51244-9.c: New. * gcc.target/sh/pr51244-10.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr51244-10.c trunk/gcc/testsuite/gcc.target/sh/pr51244-7.c trunk/gcc/testsuite/gcc.target/sh/pr51244-8.c trunk/gcc/testsuite/gcc.target/sh/pr51244-9.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #46 from Oleg Endo 2012-08-09 15:55:23 UTC --- Author: olegendo Date: Thu Aug 9 15:55:18 2012 New Revision: 190258 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190258 Log: PR target/51244 * config/sh/sh.md: Add negc extu sequence peephole. (movrt, movnegt, movrt_negc, nott): Use t_reg_operand predicate. (*movrt_negc): New insn. * config/sh/sync.md (atomic_test_and_set): Pass gen_t_reg_rtx to gen_movnegt. * config/sh/sh.c (expand_cbranchsi4, sh_emit_scc_to_t, sh_emit_compare_and_branch, sh_emit_compare_and_set): Use get_t_reg_rtx. (sh_expand_t_scc): Pass gen_t_reg_rtx to gen_movnegt. PR target/51244 * gcc.target/sh/pr51244-5: New. * gcc.target/sh/pr51244-6: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr51244-5.c trunk/gcc/testsuite/gcc.target/sh/pr51244-6.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/config/sh/sync.md trunk/gcc/testsuite/ChangeLog
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #45 from Oleg Endo 2012-07-30 06:46:40 UTC --- Author: olegendo Date: Mon Jul 30 06:46:36 2012 New Revision: 189953 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189953 Log: PR target/51244 * config/sh/sh.md (mov_neg_si_t): Move to Scc instructions section. Use t_reg_operand predicate. Add split for negated case. (ashrsi2_31): Pass get_t_reg_rtx to gen_mov_neg_si_t. * config/sh/sh.c (expand_ashiftrt): Likewise. PR target/51244 * gcc.target/sh/pr51244-4.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr51244-4.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #44 from Oleg Endo 2012-07-26 00:20:05 UTC --- Author: olegendo Date: Thu Jul 26 00:19:58 2012 New Revision: 189877 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189877 Log: PR target/51244 * config/sh/sh.opt (mzdcbranch): New option. * doc/invoke.texi: Document it. * config/sh/sh.md (negsi_cond): Use TARGET_ZDCBRANCH as condition instead of TARGET_HARD_SH4. * config/sh/sh.c (sh_option_override): Set TARGET_ZDCBRANCH as default for TARGET_HARD_SH4. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/config/sh/sh.opt trunk/gcc/doc/invoke.texi
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #43 from Oleg Endo 2012-07-23 23:29:02 UTC --- I have noticed that on SH the CANONICALIZE_COMPARISON macro is not defined, although it seems to be useful for the combine pass. Another thing that I'd like to try out is using zero-displacement branches to implement conditional execution patterns and see how it performs.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #42 from Oleg Endo 2012-07-23 22:57:42 UTC --- Author: olegendo Date: Mon Jul 23 22:57:36 2012 New Revision: 189797 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189797 Log: PR target/51244 * config/sh/predicates.md (general_movsrc_operand, general_movdst_operand): Reject T_REG. * config/sh/sh.md (*extendqisi2_compact_reg, *extendhisi2_compact_reg, movsi_i, movsi_ie, movsi_i_lowpart, *movqi_reg_reg, *movhi_reg_reg): Remove T_REG alternatives. (*negtstsi): New insn. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh.md
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #41 from Oleg Endo 2012-07-08 15:03:26 UTC --- Author: olegendo Date: Sun Jul 8 15:03:21 2012 New Revision: 189360 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189360 Log: PR target/51244 * config/sh/sh.md (*branch_true_eq, *branch_false_ne, nott): New insns. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #40 from Oleg Endo 2012-07-02 19:24:03 UTC --- Author: olegendo Date: Mon Jul 2 19:23:56 2012 New Revision: 189177 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189177 Log: PR target/51244 * config/sh/predicates.md (t_reg_operand, negt_reg_operand): New predicates. * config/sh/sh-protos.h (get_t_reg_rtx): New prototype. * config/sh/sh.c (get_t_reg_rtx): New function. Use it when invoking gen_branch_true and gen_branch_false. * config/sh/sh.md: Use get_t_reg_rtx when invoking gen_branch_true and gen_branch_false. (branch_true, branch_false): Use t_reg_operand predicate. (*branch_true, *branch_false): Delete. (movt): Use t_reg_operand predicate. (*negnegt): Use negt_reg_operand predicate and fold little and big endian variants. (*movtt): Use t_reg_operand and fold little and big endian variants. (*movt_qi): Delete. PR target/51244 * gcc.target/sh/pr51244-1.c: Check that movt insn is not generated. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr51244-1.c
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #39 from Oleg Endo 2012-06-30 12:00:38 UTC --- Created attachment 27724 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27724 Another patch I have noticed that the branch_true and branch_false insns also require some subreg variations to work properly. Otherwise redundant movt and tst insns are generated. I'm now testing the the attached patch, which fixes those issues. In addition to that the subreg 0 / subreg 3 copy-pasta has been removed by introducing t-bit predicates.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #38 from Oleg Endo 2012-05-08 21:36:35 UTC --- Author: olegendo Date: Tue May 8 21:36:30 2012 New Revision: 187298 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187298 Log: PR target/51244 * config/sh/sh.md (*branch_true, *branch_false): New insns. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #37 from Oleg Endo 2012-05-07 20:50:31 UTC --- Created attachment 27336 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27336 Supplementary patch As of rev 187217, the pr51244-1.c target testcase fails at least for m4*. The attached patch adds some 'branch_true' and 'branch_false' subreg variants which combine tries to use. This seems to fix the problem. I still would like to know whether there is a better way of handling the little / big endian subreg offsets in the patterns without doing copy-pasta.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #36 from Oleg Endo 2012-03-20 20:33:30 UTC --- I have created a new PR 52642 for the libstdc++ failures.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #35 from Kazumoto Kojima 2012-03-20 01:45:14 UTC --- (In reply to comment #34) > Interesting, thanks! I'll also test your patch and send it around, OK? OK, thanks! > I'm a bit confused... was the issue caused by my patches to for this PR, or by > something else? I guess that it was caused by another changes but was latent for a while.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #34 from Oleg Endo 2012-03-20 01:04:19 UTC --- (In reply to comment #33) > FYI, looking into the libstdc++ failures for sh4-unknown-linux-gnu, > it seems that the call insn was swapped before prologue frame insns > and then it makes unwinder confused. -fno-delayed-branch also stops > that swapping for these failing cases. The patch below works for me. > [...] Interesting, thanks! I'll also test your patch and send it around, OK? I'm a bit confused... was the issue caused by my patches to for this PR, or by something else?
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #33 from Kazumoto Kojima 2012-03-15 07:52:21 UTC --- (In reply to comment #31) > Created attachment 26859 [details] > testresult on sh4-unknown-linux-gnu [trunk revision 185088]. FYI, looking into the libstdc++ failures for sh4-unknown-linux-gnu, it seems that the call insn was swapped before prologue frame insns and then it makes unwinder confused. -fno-delayed-branch also stops that swapping for these failing cases. The patch below works for me. * config/sh/sh.c (sh_expand_prologue): Emit blockage at the end of prologue for unwinder and profiler. --- ORIG/trunk/gcc/config/sh/sh.c2012-03-06 10:28:32.0 +0900 +++ trunk/gcc/config/sh/sh.c2012-03-14 20:22:15.0 +0900 @@ -7234,6 +7234,13 @@ sh_expand_prologue (void) emit_insn (gen_shcompact_incoming_args ()); } + /* If we are profiling, make sure no instructions are scheduled before + the call to mcount. Similarly if some call instructions are swapped + before frame related insns, it'll make unwinder confused because + currently SH has no unwind info for function epilogues. */ + if (crtl->profile || flag_exceptions || flag_unwind_tables) +emit_insn (gen_blockage ()); + if (flag_stack_usage_info) current_function_static_stack_size = stack_usage; }
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #32 from Oleg Endo 2012-03-11 13:18:12 UTC --- Author: olegendo Date: Sun Mar 11 13:18:08 2012 New Revision: 185192 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185192 Log: PR target/51244 * config/sh/sh.md (movnegt): Expand into respective insns immediately. Use movrt_negc instead of negc pattern for non-SH2A. (*movnegt): Remove. (*movrt_negc, *negnegt, *movtt, *movt_qi): New insns and splits. PR target/51244 * gcc.target/sh/pr51244-1.c: Fix thinkos. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/sh/pr51244-1.c
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #31 from Kazumoto Kojima 2012-03-09 10:36:31 UTC --- Created attachment 26859 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26859 A test result testresult on sh4-unknown-linux-gnu [trunk revision 185088].
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #30 from Oleg Endo 2012-03-09 10:02:25 UTC --- (In reply to comment #29) > (In reply to comment #28) > Regtest on sh4-unknown-lunix-gnu has been done successfully. > Oleg, your patch is pre-approved. Thanks a lot! Could you please attach the testsuite summary of your setup? I'd like to compare them to my results (in particular the libstdc++ tests). I'm now getting similar effects as in #comment 9 again, where a bunch of libstdc++ failures disappear and this time one new failure appears: FAIL: 21_strings/basic_string/cons/char/6.cc execution test This is weird...
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #29 from Kazumoto Kojima 2012-03-09 08:40:32 UTC --- (In reply to comment #28) Regtest on sh4-unknown-lunix-gnu has been done successfully. Oleg, your patch is pre-approved.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #28 from Kazumoto Kojima 2012-03-09 01:44:52 UTC --- (In reply to comment #27) > Created attachment 26858 [details] > Patch for the patch Looks all fortran regressions gone away. I'll run full tests on sh4-unknown-lunix-gnu.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 Oleg Endo changed: What|Removed |Added Attachment #26853|0 |1 is obsolete|| --- Comment #27 from Oleg Endo 2012-03-09 00:26:39 UTC --- Created attachment 26858 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26858 Patch for the patch > Old tests that failed, that have disappeared: (Eeek!) > > 22_locale/ctype/is/char/3.cc execution test > 27_io/basic_filebuf/underflow/wchar_t/9178.cc execution test > gfortran.dg/widechar_intrinsics_6.f90 -Os execution test That was a feature ;) > I've attached .s files against gfortran.dg/associated_4.f90 -O1 with > patched/unpatched compilers. I'm sorry, I got the definition of the negc opcode wrong in the movrt_negc pattern. negc leaves the T bit always at '1' in this particular case, instead of inverting the T bit. It is funny that in C/C++ code it was never actually trying to re-use the T bit after the negc, but in Fortran it did. And that's what went wrong. I'm now testing the attached patch for C/C++ ...
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #26 from Kazumoto Kojima 2012-03-08 11:16:39 UTC --- Created attachment 26855 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26855 unworked .s file associated_4_bad.s I've attached .s files against gfortran.dg/associated_4.f90 -O1 with patched/unpatched compilers.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #25 from Kazumoto Kojima 2012-03-08 11:13:39 UTC --- Created attachment 26854 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26854 worked .s file associated_4_good.s
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #24 from Kazumoto Kojima 2012-03-08 11:11:32 UTC --- (In reply to comment #23) > Kaz, if you have some time, could you try it out in your setup, too please? On trunk revision 185088, for sh4-unknown-linux-gnu, the result of compare_tests is: New tests that FAIL: gfortran.dg/associated_4.f90 -O1 execution test gfortran.dg/forall_4.f90 -O3 -fomit-frame-pointer execution test gfortran.dg/forall_4.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test gfortran.dg/forall_4.f90 -O3 -fomit-frame-pointer -funroll-loops execution test gfortran.dg/forall_4.f90 -O3 -g execution test Old tests that failed, that have disappeared: (Eeek!) 22_locale/ctype/is/char/3.cc execution test 27_io/basic_filebuf/underflow/wchar_t/9178.cc execution test gfortran.dg/widechar_intrinsics_6.f90 -Os execution test
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #23 from Oleg Endo 2012-03-08 01:25:21 UTC --- Created attachment 26853 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26853 Patch for the patch The attached patch seems to fix the problem. GCC (C,C++) and CSiBE set compiles with it. Now doing the full testsuite... Kaz, if you have some time, could you try it out in your setup, too please? A thing that bugs me regarding the attached patch is the big/little endian subreg copy pasta in the patterns *negnegt, *movtt, *movt_qi. Isn't there a way to avoid that?
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #22 from Oleg Endo 2012-03-06 23:42:15 UTC --- This is a reduced test case: int test (volatile int* a, int b, int c) { a[1] = b != 0; if (b == 0) a[10] = c; return b == 0; } with '-O2 -m4-single -mb' it gets compiled to: tst r5,r5 ! b == 0 -> T mov #-1,r1 negcr1,r1 ! b != 0 -> T, r1 mov.l r1,@(4,r4) bf .L2 ! branch if (b == 0) mov.l r6,@(40,r4) .L2: tst r5,r5 rts movtr0 This is because in the 'movnegt' expander it is not mentioned that the T bit is modified and the first CSE pass optimizes away the 'b == 0' test before the branch. I'm trying to come up with some alternative approaches...
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #21 from Oleg Endo 2012-03-06 11:29:17 UTC --- (In reply to comment #20) > I've confirmed that 22_locale/ctype/is/char/3.cc doesn't fail > if linking with libstdc++.a which is built with the compiler > without r184966 changes. The .s files against 3.cc are same with > the both compilers. It looks that the problematic object is > libstdc++-v3/src/c++98/ctype_configure_char.o because the error > went away if replacing it with another one. I've attached .i and > .s files for that file. The option used is [...] Cool. Thanks a lot! I think I know what the problem is now. Looking into it...
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #20 from Kazumoto Kojima 2012-03-06 10:40:31 UTC --- (In reply to comment #16) > Can we keep the r184966 changes anyways? I will keep an eye on these failures > whether I can reproduce them. If you have some time, could you please send me > the intermediate .i and .s files of the failing and passing version of the > '22_locale/ctype/is/char/3.cc' test case? I've confirmed that 22_locale/ctype/is/char/3.cc doesn't fail if linking with libstdc++.a which is built with the compiler without r184966 changes. The .s files against 3.cc are same with the both compilers. It looks that the problematic object is libstdc++-v3/src/c++98/ctype_configure_char.o because the error went away if replacing it with another one. I've attached .i and .s files for that file. The option used is COLLECT_GCC_OPTIONS='-shared-libgcc' '-B' '/exp/ldroot/dodes/xsh-gcc/./gcc' '-nostdinc++' '-L/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/src' '-L/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/src/.libs' '-B' '/usr/local/sh4-unknown-linux-gnu/bin/' '-B' '/usr/local/sh4-unknown-linux-gnu/lib/' '-isystem' '/usr/local/sh4-unknown-linux-gnu/include' '-isystem' '/usr/local/sh4-unknown-linux-gnu/sys-include' '-I' '/exp/ldroot/dodes/ORIG/trunk/libstdc++-v3/../libgcc' '-I' '/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/include/sh4-unknown-linux-gnu' '-I' '/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/include' '-I' '/exp/ldroot/dodes/ORIG/trunk/libstdc++-v3/libsupc++' '-fno-implicit-templates' '-Wall' '-Wextra' '-Wwrite-strings' '-Wcast-qual' '-Wabi' '-fdiagnostics-show-location=once' '-ffunction-sections' '-fdata-sections' '-frandom-seed=ctype_configure_char.lo' '-g' '-O2' '-D' '_GNU_SOURCE' '-S' '-fPIC' '-D' 'PIC' '-o'
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #19 from Kazumoto Kojima 2012-03-06 10:38:22 UTC --- Created attachment 26839 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26839 unworked .s file ctype_configure_char_bad.s
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #18 from Kazumoto Kojima 2012-03-06 10:37:13 UTC --- Created attachment 26838 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26838 worked .s file ctype_configure_char_good.s
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #17 from Kazumoto Kojima 2012-03-06 10:36:01 UTC --- Created attachment 26837 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26837 preprocessed file ctype_configure_char.i
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #16 from Oleg Endo 2012-03-06 09:48:31 UTC --- (In reply to comment #15) > I've seen same failures on sh4-unknown-linux-gnu for trunk rev 184971. > With backing r184966 changes out, they went away. Weird. Can we keep the r184966 changes anyways? I will keep an eye on these failures whether I can reproduce them. If you have some time, could you please send me the intermediate .i and .s files of the failing and passing version of the '22_locale/ctype/is/char/3.cc' test case?
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #15 from Kazumoto Kojima 2012-03-06 08:49:27 UTC --- (In reply to comment #14) > I've run the testsuite on rev 184966 (without fortran though), but the > failures > that you've mentioned did not show up. Usually when I rebuild the whole > toolchain including newlib, I have C/CPP/CXXFLAGS_FOR_TARGET set to '-Os > -mpretend-cmove'. This time I removed those, but the results seem to be the > same. Could you also please try again? This is suspicious... I've seen same failures on sh4-unknown-linux-gnu for trunk rev 184971. With backing r184966 changes out, they went away. Weird.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #14 from Oleg Endo 2012-03-06 08:26:06 UTC --- (In reply to comment #13) > On Tue, 2012-03-06 at 08:13 +0900, Kaz Kojima wrote: > > > I've tested your latest patch on sh4-unknown-linux-gnu with trunk > > revision 184872. It looks that some new failures are poping up: > > > > New tests that FAIL: > > > > 22_locale/ctype/is/char/3.cc execution test > > 27_io/basic_filebuf/underflow/wchar_t/9178.cc execution test > > gfortran.dg/widechar_intrinsics_6.f90 -Os execution test > > > > Pehaps failures might be ones you've suggested in #10 in > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 > > > > Could you double check? > > Of course! Doing so now... I've run the testsuite on rev 184966 (without fortran though), but the failures that you've mentioned did not show up. Usually when I rebuild the whole toolchain including newlib, I have C/CPP/CXXFLAGS_FOR_TARGET set to '-Os -mpretend-cmove'. This time I removed those, but the results seem to be the same. Could you also please try again? This is suspicious...
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #13 from Oleg Endo 2012-03-05 23:37:35 UTC --- On Tue, 2012-03-06 at 08:13 +0900, Kaz Kojima wrote: > I've tested your latest patch on sh4-unknown-linux-gnu with trunk > revision 184872. It looks that some new failures are poping up: > > New tests that FAIL: > > 22_locale/ctype/is/char/3.cc execution test > 27_io/basic_filebuf/underflow/wchar_t/9178.cc execution test > gfortran.dg/widechar_intrinsics_6.f90 -Os execution test > > Pehaps failures might be ones you've suggested in #10 in > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 > > Could you double check? Of course! Doing so now...
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #12 from Oleg Endo 2012-03-05 23:12:27 UTC --- Author: olegendo Date: Mon Mar 5 23:12:20 2012 New Revision: 184966 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184966 Log: PR target/51244 * config/sh/sh.c (sh_expand_t_scc): Remove SH2A special case and use unified expansion logic. * config/sh/sh.md (xorsi3_movrt): Rename to movrt. Move closer to the existing movt insn. (negc): Rename insn to *negc. Add new expander. (movnegt): Use xor pattern for T bit negation. Reserve helper constant for negc pattern. (*movnegt): New insn and splitter. PR target/51244 * gcc.target/sh/pr51244-1.c: New. * gcc.target/sh/pr51244-2.c: New. * gcc.target/sh/pr51244-3.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr48596.c trunk/gcc/testsuite/gcc.target/sh/pr51244-1.c trunk/gcc/testsuite/gcc.target/sh/pr51244-2.c trunk/gcc/testsuite/gcc.target/sh/pr51244-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 Oleg Endo changed: What|Removed |Added Attachment #26812|0 |1 is obsolete|| --- Comment #11 from Oleg Endo 2012-03-04 17:24:44 UTC --- Created attachment 26822 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26822 Proposed patch This patch should be better now. However, I'm not sure how well this will work with SH64 due to the (arbitrary) TARGET_SH1 conditions in the insns.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #10 from Oleg Endo 2012-03-03 12:32:29 UTC --- (In reply to comment #9) > Created attachment 26812 [details] > Proposed patch > > I've tested this patch again against rev 184764 (GCC 4.7) with > > make -k check RUNTESTFLAGS="--target_board=sh-sim\{ > -m2/-ml,-m2/-mb,-m2a-single/-mb,-m4-single/-ml, > -m4-single/-mb,-m4a-single/-ml,-m4a-single/-mb}" > > Surprisingly, it fixes the following libstdc++ tests. > That was a false alarm. I've messed up the test results somehow. The libstdc++ test case fixes have nothing to do with the patch, but rather rev 184764 vs. rev 184829. Sorry for any confusion. > > However, it also introduces two new of new failures. > > For all sub targets: > FAIL: 21_strings/basic_string/cons/char/6.cc execution test > > For -m4a-single and -m4-single (-ml and -mb): > FAIL: 22_locale/ctype/is/char/3.cc execution test > > I'm looking into what is happening in the two cases. It seems that when building newlib something gets messed up related to delayed branches. Building newlib with -fno-delayed-branch seems to make the failures go away.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 Oleg Endo changed: What|Removed |Added Attachment #26191|0 |1 is obsolete|| --- Comment #9 from Oleg Endo 2012-03-02 21:56:38 UTC --- Created attachment 26812 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26812 Proposed patch I've tested this patch again against rev 184764 (GCC 4.7) with make -k check RUNTESTFLAGS="--target_board=sh-sim\{ -m2/-ml,-m2/-mb,-m2a-single/-mb,-m4-single/-ml, -m4-single/-mb,-m4a-single/-ml,-m4a-single/-mb}" Surprisingly, it fixes the following libstdc++ tests. For all sub targets: 23_containers/forward_list/requirements/exception/basic.cc 23_containers/forward_list/requirements/exception/propagation_consistent.cc 23_containers/list/requirements/exception/basic.cc 23_containers/list/requirements/exception/propagation_consistent.cc 23_containers/multiset/requirements/exception/basic.cc 23_containers/multiset/requirements/exception/propagation_consistent.cc 23_containers/unordered_map/requirements/exception/propagation_consistent.cc 23_containers/unordered_multimap/requirements/exception/basic.cc 23_containers/unordered_multiset/requirements/exception/basic.cc 23_containers/unordered_multiset/requirements/exception/propagation_consistent.cc 23_containers/unordered_set/requirements/exception/propagation_consistent.cc ext/pb_ds/regression/list_update_map_rand.cc ext/pb_ds/regression/list_update_set_rand.cc For -m4a-single and -m4-single (-ml and -mb): 23_containers/forward_list/requirements/exception/basic.cc 23_containers/forward_list/requirements/exception/propagation_consistent.cc 23_containers/list/requirements/exception/basic.cc 23_containers/list/requirements/exception/propagation_consistent.cc 23_containers/multiset/requirements/exception/basic.cc 23_containers/multiset/requirements/exception/propagation_consistent.cc However, it also introduces two new of new failures. For all sub targets: FAIL: 21_strings/basic_string/cons/char/6.cc execution test For -m4a-single and -m4-single (-ml and -mb): FAIL: 22_locale/ctype/is/char/3.cc execution test I'm looking into what is happening in the two cases.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 olegendo at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2012-02-26 CC||olegendo at gcc dot gnu.org AssignedTo|unassigned at gcc dot |olegendo at gcc dot gnu.org |gnu.org | Ever Confirmed|0 |1
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #8 from Oleg Endo 2011-12-30 21:21:14 UTC --- (In reply to comment #7) > (In reply to comment #3) > > I haven't ran all tests on it yet, but CSiBE shows average code size > > reduction > > of approx. -0.1% for -m4* with some code size increases in some files. > > Would something like that be OK for stage 3? > > Looks good, though not appropriate for stage 3, I think. The patch passed the testsuite without new failures. I'll queue it up for stage 1.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #7 from Kazumoto Kojima 2011-12-28 22:25:48 UTC --- (In reply to comment #3) > I haven't ran all tests on it yet, but CSiBE shows average code size reduction > of approx. -0.1% for -m4* with some code size increases in some files. > Would something like that be OK for stage 3? Looks good, though not appropriate for stage 3, I think.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #6 from Oleg Endo 2011-12-28 15:59:35 UTC --- (In reply to comment #3) > Created attachment 26191 [details] > Proposed patch to improve some of the issues. > > The attached patch removes the useless sequence and still allows the -1 > constant to be CSE-ed for such cases as the example function above. > > I haven't ran all tests on it yet, but CSiBE shows average code size reduction > of approx. -0.1% for -m4* with some code size increases in some files. Some of the code size increases are caused by the ifcvt.c pass which tries to transform sequences like: int test_func_6 (int a, int b, int c) { if (a == 16) c = 0; return b + c; } into branch-free code like: mov r4,r0 ! 45movsi_ie/2[length = 2] cmp/eq #16,r0 ! 9 cmpeqsi_t/2[length = 2] mov #-1,r0 ! 34movsi_ie/3[length = 2] negcr0,r0 ! 38*negc[length = 2] neg r0,r0 ! 36negsi2[length = 2] and r6,r0 ! 37*andsi3_compact/2[length = 2] rts ! 48*return_i[length = 2] add r5,r0 ! 14*addsi3_compact[length = 2] instead of the more compact (and on SH4 most likely better): movr4,r0 ! 41movsi_ie/2[length = 2] cmp/eq#16,r0 ! 9cmpeqsi_t/2[length = 2] bf0f ! 34*movsicc_t_true/2[length = 4] mov#0,r6 0: addr5,r6 ! 14*addsi3_compact[length = 2] rts ! 44*return_i[length = 2] movr6,r0 ! 19movsi_ie/2[length = 2] This particular case is handled in noce_try_store_flag_mask, which does the transformation if BRANCH_COST >= 2, which is true for -m4. I guess before the patch ifcvt didn't realize that this transformation can be applied. I've tried setting BRANCH_COST to 1, which avoids this transformation but increases overall code size a bit.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #5 from Oleg Endo 2011-12-28 02:44:05 UTC --- (In reply to comment #2) > (In reply to comment #1) > > > > BTW, OT, (a != b || a != c) ? b : c could be reduced to b, I think. > > > > Yes, very much so. > It is reduced to "return b" for -m2, -m2e, -m2a, -m3, -m3e > but not for -m1 and -m4*. This seems to be due to the following in sh.h: #define BRANCH_COST(speed_p, predictable_p) \ (TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1)
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #4 from Oleg Endo 2011-12-27 23:17:03 UTC --- (In reply to comment #1) > > > return a >= 0 && b >= 0 ? c : d; > > x >= 0 is expanded to the sequence like > > ra = not x > rb = -31 > rc = ra >> (neg rb) > T = (rc == 0) > conditional jump > > and combine tries to simplify it. combine simplifies b >= 0 > successfully into shll and bt but fails to simplify a >= 0. > It seems that combine doesn't do constant propagation well and > misses the constant -31. Another simpler fail: int test_func_22_NG (int a, int b, int c, int d) { return a >= 0; } becomes: not r4,r0 ! 9one_cmplsi2[length = 2] mov #-31,r1 ! 12movsi_ie/3[length = 2] rts ! 31*return_i[length = 2] shldr1,r0 ! 13lshrsi3_d[length = 2] which could be: cmp/pzr4 rts movtr0 >From what I could observe, this is caused by the various shift insns which leads combine to this result. For example, the shll, branch sequence that is used instead of cmp/pz, branch is caused by the ashlsi_c insn, which defines a lt:SI comparison. Although that is correct, using cmp/pz could be better, since it does not modify the reg, and on SH4 it is an MT group insn. The ashlsi_c insn / lt:SI picking can be avoided by adjusting the rtx costs, for instance (just tried it out briefly). I think a peephole in this case could fix some of the symptoms but not the actual cause. I'll see if I can come up with something that works without a peephole, even though all the shift stuff looks a bit suspicious ;)
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #3 from Oleg Endo 2011-12-27 22:43:11 UTC --- Created attachment 26191 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26191 Proposed patch to improve some of the issues. (In reply to comment #1) > > [...] > > mov #-1,rn > negc rn,rm > tst #255,rm > > which is essentially T_reg = T_reg. Usually combine catches such > situation, but negc might be too complex for combine. > For this case, replacing current movnegt expander by insn, splitter > and peephole something like > > [...] > > the above useless sequence could be removed, though we will miss > the chance that the -1 can be CSE-ed when the cstore value is > used. This will cause a bit worse code for the loop like > > int > foo (int *a, int x, int n) > { > int i; > int count; > > for (i = 0; i < n; i++) > count += (*(a + i) != x); > > return count; > } > Thanks for your ideas and comments. It was really useful. The attached patch removes the useless sequence and still allows the -1 constant to be CSE-ed for such cases as the example function above. I haven't ran all tests on it yet, but CSiBE shows average code size reduction of approx. -0.1% for -m4* with some code size increases in some files. Would something like that be OK for stage 3?
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #2 from Oleg Endo 2011-12-27 21:26:33 UTC --- (In reply to comment #1) > > BTW, OT, (a != b || a != c) ? b : c could be reduced to b, I think. > Yes, very much so. It is reduced to "return b" for -m2, -m2e, -m2a, -m3, -m3e but not for -m1 and -m4*. The correct test function should be rather: int test_func_0_NG (int a, int b, int c, int d) { return (a != b || a != d) ? b : c; } which is actually OK for all variants except -m1 and -m4*: cmp/eqr5,r4! 11cmpeqsi_t/3[length = 2] bf.s.L6! 12branch_false[length = 2] cmp/eqr7,r5! 14cmpeqsi_t/3[length = 2] bf.L6! 15branch_false[length = 2] movr6,r5! 8movsi_i/2[length = 2] .L6: rts! 42*return_i[length = 2] movr5,r0! 23movsi_i/2[length = 2]
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #1 from Kazumoto Kojima 2011-11-22 22:33:43 UTC --- > return (a != b || a != c) ? b : c; test_func_0_NG and test_func_1_NG cases are related with the target implementation of cstoresi4. The middle end expands a complex conditional jump to cstores and a simple conditional jumps. For expression a != b, SH's cstoresi4 implementation uses sh.c:sh_emit_compare_and_set which generates cmp/eq and movnegt insn, because we have no cmp/ne insn. Then we've got the sequence mov #-1,rn negc rn,rm tst #255,rm which is essentially T_reg = T_reg. Usually combine catches such situation, but negc might be too complex for combine. For this case, replacing current movnegt expander by insn, splitter and peephole something like (define_insn "movnegt" [(set (match_operand:SI 0 "arith_reg_dest" "=r") (plus:SI (reg:SI T_REG) (const_int -1))) (clobber (match_scratch:SI 1 "=&r")) (clobber (reg:SI T_REG))] "" "#" [(set_attr "length" "4")]) (define_split [(set (match_operand:SI 0 "arith_reg_dest" "=r") (plus:SI (reg:SI T_REG) (const_int -1))) (clobber (match_scratch:SI 1 "=&r")) (clobber (reg:SI T_REG))] "reload_completed" [(set (match_dup 1) (const_int -1)) (parallel [(set (match_dup 0) (neg:SI (plus:SI (reg:SI T_REG) (match_dup 1 (set (reg:SI T_REG) (ne:SI (ior:SI (reg:SI T_REG) (match_dup 1)) (const_int 0)))])] "") (define_peephole2 [(set (match_operand:SI 1 "" "") (const_int -1)) (parallel [(set (match_operand:SI 0 "" "") (neg:SI (plus:SI (reg:SI T_REG) (match_dup 1 (set (reg:SI T_REG) (ne:SI (ior:SI (reg:SI T_REG) (match_dup 1)) (const_int 0)))]) (set (reg:SI T_REG) (eq:SI (match_operand:QI 3 "" "") (const_int 0)))] "REGNO (operands[3]) == REGNO (operands[0]) && peep2_reg_dead_p (3, operands[0]) && peep2_reg_dead_p (3, operands[1])" [(const_int 0)] "") the above useless sequence could be removed, though we will miss the chance that the -1 can be CSE-ed when the cstore value is used. This will cause a bit worse code for the loop like int foo (int *a, int x, int n) { int i; int count; for (i = 0; i < n; i++) count += (*(a + i) != x); return count; } though it may be relatively rare. BTW, OT, (a != b || a != c) ? b : c could be reduced to b, I think. > return a >= 0 && b >= 0 ? c : d; x >= 0 is expanded to the sequence like ra = not x rb = -31 rc = ra >> (neg rb) T = (rc == 0) conditional jump and combine tries to simplify it. combine simplifies b >= 0 successfully into shll and bt but fails to simplify a >= 0. It seems that combine doesn't do constant propagation well and misses the constant -31. In this case, a peephole like (define_peephole2 [(set (match_operand:SI 0 "arith_reg_dest" "") (not:SI (match_operand:SI 1 "arith_reg_operand" ""))) (set (match_operand:SI 2 "arith_reg_dest" "") (const_int -31)) (set (match_operand:SI 3 "arith_reg_dest" "") (lshiftrt:SI (match_dup 0) (neg:SI (match_dup 2 (set (reg:SI T_REG) (eq:SI (match_operand:QI 4 "arith_reg_operand" "") (const_int 0))) (set (pc) (if_then_else (match_operator 5 "comparison_operator" [(reg:SI T_REG) (const_int 0)]) (label_ref (match_operand 6 "" "")) (pc)))] "REGNO (operands[3]) == REGNO (operands[4]) && peep2_reg_dead_p (4, operands[0]) && (peep2_reg_dead_p (4, operands[3]) || rtx_equal_p (operands[2], operands[3])) && peep2_regno_dead_p (5, T_REG)" [(set (match_dup 2) (const_int -31)) (set (reg:SI T_REG) (ge:SI (match_dup 1) (const_int 0))) (set (pc) (if_then_else (match_op_dup 7 [(reg:SI T_REG) (const_int 0)]) (label_ref (match_dup 6)) (pc)))] " { operands[7] = gen_rtx_fmt_ee (reverse_condition (GET_CODE (operands[5])), GET_MODE (operands[5]), XEXP (operands[5], 0), XEXP (operands[5], 1)); }") will be a workaround. It isn't ideal, but better than nothing. > return a == b ? test_sub0 (a, b) : test_sub1 (a, b); > return a != b ? test_sub0 (a, b) : test_sub1 (a, b); This case is intresting. At -Os, two calls are converted into one computed goto. A bit surprisingly, the conversion is done as a side effect of combine-stack-adjustments pass. That pass calls cleanup_cfg (flag_crossjumping ? CLEANUP_CROSSJUMP : 0); and the cross jumping optimization merges two calls. With -Os -fno-delayed-branch, the OK case is compiled to test_func_3_OK: mov r4,r1 cmp/eq r5,r1 mov.l .L4,r0 bf .L3 mov r1,r5 mov.l .L5,r0 bra .L3 nop .L3: jmp @r0 nop and the NG case test_func_3_NG: mov r4,r1 cmp/eq r5,r1 bt .L2 mov.l .L4,r0 bra .L