subject:"\[Bug target\/51244\] SH Target\: Inefficient conditional branch"

[Bug target/51244] SH Target: Inefficient conditional branch

2012-09-04 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #52 from Oleg Endo  2012-09-04 
08:03:08 UTC ---
Author: olegendo
Date: Tue Sep  4 08:03:01 2012
New Revision: 190909

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190909
Log:
PR target/51244
* config/sh/sh.c (prepare_cbranch_operands): Pull out comparison
canonicalization code into...
* (sh_canonicalize_comparison): This new function.
* config/sh/sh-protos.h: Declare it.
* config/sh/sh.h: Use it in new macro CANONICALIZE_COMPARISON.
* config/sh/sh.md (cbranchsi4): Remove TARGET_CBRANCHDI4 check and
always invoke expand_cbranchsi4.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh-protos.h
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.h
trunk/gcc/config/sh/sh.md

[Bug target/51244] SH Target: Inefficient conditional branch

2012-08-31 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #51 from Oleg Endo  2012-08-31 
15:50:35 UTC ---
(In reply to comment #50)

Thanks!
Hmm .. difficult.  
There seem to be 17 improvements and 10 dis-improvements, but the
dis-improvements seem heavier.  The improvement avg is 1.1% and the
dis-improvements avg is -2.1%.  I don't know .. maybe this should wait a bit
more.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-08-31 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #50 from Kazumoto Kojima  2012-08-31 
10:54:44 UTC ---
(In reply to comment #49)
> Kaz, if you have some time, could you please gather some CSiBE runtime numbers
> for '-mpretend-cmove' and without it?

Here is the runtime result with -O2:

test no cmoveratio(%)

bzip2-1.0.2 bzip2.d  10.976711.07-0.84312
bzip2-1.0.2 bzip2recover 4.703334.69333   0.213068
bzip2-1.0.2 bzip2.c  43.086743.73-1.47115
compiler vam.fib 2.026672.00667   0.996678
compiler vam.fact1.913331.89333   1.05634
compiler vam.test2   0.256667   0.27 -3.75
Here is the runtime result with -O2:

test no cmoveratio(%)

bzip2-1.0.2 bzip2.d  10.976711.07-0.84312
bzip2-1.0.2 bzip2recover 4.703334.69333   0.213068
bzip2-1.0.2 bzip2.c  43.086743.73-1.47115
compiler vam.fib 2.026672.00667   0.996678
compiler vam.fact1.913331.89333   1.05634
compiler vam.test2   0.256667   0.27 -3.75
flex-2.5.31 flex 13.18  13.02 1.22888
jikespg-1.3 jikespg  1.616671.6   1.04167
jpeg-6b jpegtran24.65   4.61  0.867679
jpeg-6b djpeg2   2.33   2.28667   1.89504
jpeg-6b djpeg1   2.293332.24667   2.07715
jpeg-6b cjpeg2   3.013332.99667   0.556174
jpeg-6b djpeg0   0.336667   0.35 -3.80952
jpeg-6b cjpeg0   0.476667   0.486667 -2.05479
jpeg-6b cjpeg1   3.063332.99667   2.22469
jpeg-6b jpegtran00.26   0.27 -2.46914
jpeg-6b jpegtran11.91.86667   1.78571
libpng-1.2.5 png2pnm00.986667   0.96  2.42215
libpng-1.2.5 pnm2png144.633345.6333  -2.19138
libpng-1.2.5 pnm2png07.936678.09333  -1.93575
libpng-1.2.5 png2pnm16.73   6.75 -0.296296
teem-1.6.0-src dehex01.67   1.66333   0.400802
teem-1.6.0-src dehex110.96  10.9367   0.21335
teem-1.6.0-src enhex141.176740.5733   1.48702
teem-1.6.0-src enhex06.183336.31 -2.0074
zlib-1.1.4 minigzip0 46.486746.2533   0.504468
zlib-1.1.4 minigzip  5.523335.50333   0.363416
flex-2.5.31 flex 13.18  13.02 1.22888
jikespg-1.3 jikespg  1.616671.6   1.04167
jpeg-6b jpegtran24.65   4.61  0.867679
jpeg-6b djpeg2   2.33   2.28667   1.89504
jpeg-6b djpeg1   2.293332.24667   2.07715
jpeg-6b cjpeg2   3.013332.99667   0.556174
jpeg-6b djpeg0   0.336667   0.35 -3.80952
jpeg-6b cjpeg0   0.476667   0.486667 -2.05479
jpeg-6b cjpeg1   3.063332.99667   2.22469
jpeg-6b jpegtran00.26   0.27 -2.46914
jpeg-6b jpegtran11.91.86667   1.78571
libpng-1.2.5 png2pnm00.986667   0.96  2.42215
libpng-1.2.5 pnm2png144.633345.6333  -2.19138
libpng-1.2.5 pnm2png07.936678.09333  -1.93575
libpng-1.2.5 png2pnm16.73   6.75 -0.296296
teem-1.6.0-src dehex01.67   1.66333   0.400802
teem-1.6.0-src dehex110.96  10.9367   0.21335
teem-1.6.0-src enhex141.176740.5733   1.48702
teem-1.6.0-src enhex06.183336.31 -2.0074
zlib-1.1.4 minigzip0 46.486746.2533   0.504468
zlib-1.1.4 minigzip  5.523335.50333   0.363416

[Bug target/51244] SH Target: Inefficient conditional branch

2012-08-30 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #49 from Oleg Endo  2012-08-30 
22:54:23 UTC ---
Kaz, if you have some time, could you please gather some CSiBE runtime numbers
for '-mpretend-cmove' and without it?

I've compared the result-size of the CSiBE set and with -mpretend-cmove there's
a total decrease of 948 bytes, with a few opposite cases.

My idea was to obsolete the -mpretend-cmove option, and instead tie its
behavior the new option -mzdcbranch, which generally is supposed to control any
kind of zero-displacement-branch handling.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-08-20 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #48 from Oleg Endo  2012-08-20 
20:51:12 UTC ---
Author: olegendo
Date: Mon Aug 20 20:51:06 2012
New Revision: 190544

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190544
Log:
PR target/51244
* config/sh/sh.md (*cset_zero): New insns.

PR target/51244
* gcc.target/sh/pr51244-11.c: New.


Added:
trunk/gcc/testsuite/gcc.target/sh/pr51244-11.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog

[Bug target/51244] SH Target: Inefficient conditional branch

2012-08-12 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #47 from Oleg Endo  2012-08-12 
22:47:21 UTC ---
Author: olegendo
Date: Sun Aug 12 22:47:15 2012
New Revision: 190331

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190331
Log:
PR target/51244
* config/sh/sh.md: Add splits for inverted compare and branch
opportunities.
(*cmpeqsi_t): New insn.
(cmpgtsi_t, cmpgesi_t): Swap r and N alternatives.
(cmpgeusi_t): Use satisfies_constraint_Z.  Emit sett insn in
replacement insn list and not in the preparation statements.
(clrt, sett): Add mt_group attribute.

PR target/51244
* gcc.target/sh/pr51244-7.c: New.
* gcc.target/sh/pr51244-8.c: New.
* gcc.target/sh/pr51244-9.c: New.
* gcc.target/sh/pr51244-10.c: New.


Added:
trunk/gcc/testsuite/gcc.target/sh/pr51244-10.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-7.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-8.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-9.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog

[Bug target/51244] SH Target: Inefficient conditional branch

2012-08-09 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #46 from Oleg Endo  2012-08-09 
15:55:23 UTC ---
Author: olegendo
Date: Thu Aug  9 15:55:18 2012
New Revision: 190258

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190258
Log:
PR target/51244
* config/sh/sh.md: Add negc extu sequence peephole.
(movrt, movnegt, movrt_negc, nott): Use t_reg_operand predicate.
(*movrt_negc): New insn.
* config/sh/sync.md (atomic_test_and_set): Pass gen_t_reg_rtx to
gen_movnegt.
* config/sh/sh.c (expand_cbranchsi4, sh_emit_scc_to_t,
sh_emit_compare_and_branch, sh_emit_compare_and_set): Use get_t_reg_rtx.
(sh_expand_t_scc): Pass gen_t_reg_rtx to gen_movnegt.

PR target/51244
* gcc.target/sh/pr51244-5: New.
* gcc.target/sh/pr51244-6: New.


Added:
trunk/gcc/testsuite/gcc.target/sh/pr51244-5.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-6.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/config/sh/sync.md
trunk/gcc/testsuite/ChangeLog

[Bug target/51244] SH Target: Inefficient conditional branch

2012-07-29 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #45 from Oleg Endo  2012-07-30 
06:46:40 UTC ---
Author: olegendo
Date: Mon Jul 30 06:46:36 2012
New Revision: 189953

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189953
Log:
PR target/51244
* config/sh/sh.md (mov_neg_si_t): Move to Scc instructions section.
Use t_reg_operand predicate.  Add split for negated case.
(ashrsi2_31): Pass get_t_reg_rtx to gen_mov_neg_si_t.
* config/sh/sh.c (expand_ashiftrt): Likewise.

PR target/51244
* gcc.target/sh/pr51244-4.c: New.


Added:
trunk/gcc/testsuite/gcc.target/sh/pr51244-4.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog

[Bug target/51244] SH Target: Inefficient conditional branch

2012-07-25 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #44 from Oleg Endo  2012-07-26 
00:20:05 UTC ---
Author: olegendo
Date: Thu Jul 26 00:19:58 2012
New Revision: 189877

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189877
Log:
PR target/51244
* config/sh/sh.opt (mzdcbranch): New option.
* doc/invoke.texi: Document it.
* config/sh/sh.md (negsi_cond): Use TARGET_ZDCBRANCH as condition
instead of TARGET_HARD_SH4.
* config/sh/sh.c (sh_option_override): Set TARGET_ZDCBRANCH as default
for TARGET_HARD_SH4.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/config/sh/sh.opt
trunk/gcc/doc/invoke.texi

[Bug target/51244] SH Target: Inefficient conditional branch

2012-07-23 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #43 from Oleg Endo  2012-07-23 
23:29:02 UTC ---
I have noticed that on SH the CANONICALIZE_COMPARISON macro is not defined,
although it seems to be useful for the combine pass.

Another thing that I'd like to try out is using zero-displacement branches to
implement conditional execution patterns and see how it performs.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-07-23 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #42 from Oleg Endo  2012-07-23 
22:57:42 UTC ---
Author: olegendo
Date: Mon Jul 23 22:57:36 2012
New Revision: 189797

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189797
Log:
PR target/51244
* config/sh/predicates.md (general_movsrc_operand,
general_movdst_operand): Reject T_REG.
* config/sh/sh.md (*extendqisi2_compact_reg, *extendhisi2_compact_reg,
movsi_i, movsi_ie, movsi_i_lowpart, *movqi_reg_reg, *movhi_reg_reg):
Remove T_REG alternatives.
(*negtstsi): New insn.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/predicates.md
trunk/gcc/config/sh/sh.md

[Bug target/51244] SH Target: Inefficient conditional branch

2012-07-08 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #41 from Oleg Endo  2012-07-08 
15:03:26 UTC ---
Author: olegendo
Date: Sun Jul  8 15:03:21 2012
New Revision: 189360

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189360
Log:
PR target/51244
* config/sh/sh.md (*branch_true_eq, *branch_false_ne, nott): New insns.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md

[Bug target/51244] SH Target: Inefficient conditional branch

2012-07-02 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #40 from Oleg Endo  2012-07-02 
19:24:03 UTC ---
Author: olegendo
Date: Mon Jul  2 19:23:56 2012
New Revision: 189177

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189177
Log:
PR target/51244
* config/sh/predicates.md (t_reg_operand, negt_reg_operand): New
predicates.
* config/sh/sh-protos.h (get_t_reg_rtx): New prototype.
* config/sh/sh.c (get_t_reg_rtx): New function.  Use it when invoking
gen_branch_true and gen_branch_false.
* config/sh/sh.md: Use get_t_reg_rtx when invoking gen_branch_true and
gen_branch_false.
(branch_true, branch_false): Use t_reg_operand predicate.
(*branch_true, *branch_false): Delete.
(movt): Use t_reg_operand predicate.
(*negnegt): Use negt_reg_operand predicate and fold little and big
endian variants.
(*movtt): Use t_reg_operand and fold little and big endian variants.
(*movt_qi): Delete.

PR target/51244
* gcc.target/sh/pr51244-1.c: Check that movt insn is not generated.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/predicates.md
trunk/gcc/config/sh/sh-protos.h
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/sh/pr51244-1.c

[Bug target/51244] SH Target: Inefficient conditional branch

2012-06-30 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #39 from Oleg Endo  2012-06-30 
12:00:38 UTC ---
Created attachment 27724
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27724
Another patch

I have noticed that the branch_true and branch_false insns also require
some subreg variations to work properly.  Otherwise redundant movt and tst
insns are generated.

I'm now testing the the attached patch, which fixes those issues.

In addition to that the subreg 0 / subreg 3 copy-pasta has been removed by
introducing t-bit predicates.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-05-08 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #38 from Oleg Endo  2012-05-08 
21:36:35 UTC ---
Author: olegendo
Date: Tue May  8 21:36:30 2012
New Revision: 187298

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187298
Log:
PR target/51244
* config/sh/sh.md (*branch_true, *branch_false): New insns.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md

[Bug target/51244] SH Target: Inefficient conditional branch

2012-05-07 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #37 from Oleg Endo  2012-05-07 
20:50:31 UTC ---
Created attachment 27336
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27336
Supplementary patch

As of rev 187217, the pr51244-1.c target testcase fails at least for m4*.
The attached patch adds some 'branch_true' and 'branch_false' subreg variants
which combine tries to use.  This seems to fix the problem.
I still would like to know whether there is a better way of handling the little
/ big endian subreg offsets in the patterns without doing copy-pasta.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-20 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #36 from Oleg Endo  2012-03-20 
20:33:30 UTC ---
I have created a new PR 52642 for the libstdc++ failures.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-19 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #35 from Kazumoto Kojima  2012-03-20 
01:45:14 UTC ---
(In reply to comment #34)
> Interesting, thanks!  I'll also test your patch and send it around, OK?

OK, thanks!

> I'm a bit confused... was the issue caused by my patches to for this PR, or by
> something else?

I guess that it was caused by another changes but was latent for a while.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-19 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #34 from Oleg Endo  2012-03-20 
01:04:19 UTC ---
(In reply to comment #33)
> FYI, looking into the libstdc++ failures for sh4-unknown-linux-gnu,
> it seems that the call insn was swapped before prologue frame insns
> and then it makes unwinder confused.  -fno-delayed-branch also stops
> that swapping for these failing cases.  The patch below works for me.
> [...]

Interesting, thanks!  I'll also test your patch and send it around, OK?

I'm a bit confused... was the issue caused by my patches to for this PR, or by
something else?

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-15 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #33 from Kazumoto Kojima  2012-03-15 
07:52:21 UTC ---
(In reply to comment #31)
> Created attachment 26859 [details]
> testresult on sh4-unknown-linux-gnu [trunk revision 185088].

FYI, looking into the libstdc++ failures for sh4-unknown-linux-gnu,
it seems that the call insn was swapped before prologue frame insns
and then it makes unwinder confused.  -fno-delayed-branch also stops
that swapping for these failing cases.  The patch below works for me.

* config/sh/sh.c (sh_expand_prologue): Emit blockage at the end
of prologue for unwinder and profiler.

--- ORIG/trunk/gcc/config/sh/sh.c2012-03-06 10:28:32.0 +0900
+++ trunk/gcc/config/sh/sh.c2012-03-14 20:22:15.0 +0900
@@ -7234,6 +7234,13 @@ sh_expand_prologue (void)
   emit_insn (gen_shcompact_incoming_args ());
 }

+  /* If we are profiling, make sure no instructions are scheduled before
+ the call to mcount.  Similarly if some call instructions are swapped
+ before frame related insns, it'll make unwinder confused because
+ currently SH has no unwind info for function epilogues.  */
+  if (crtl->profile || flag_exceptions || flag_unwind_tables)
+emit_insn (gen_blockage ());
+
   if (flag_stack_usage_info)
 current_function_static_stack_size = stack_usage;
 }

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-11 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #32 from Oleg Endo  2012-03-11 
13:18:12 UTC ---
Author: olegendo
Date: Sun Mar 11 13:18:08 2012
New Revision: 185192

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185192
Log:
PR target/51244
* config/sh/sh.md (movnegt): Expand into respective insns immediately.
Use movrt_negc instead of negc pattern for non-SH2A.
(*movnegt): Remove.
(*movrt_negc, *negnegt, *movtt, *movt_qi): New insns and splits.

PR target/51244
* gcc.target/sh/pr51244-1.c: Fix thinkos.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/sh/pr51244-1.c

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-09 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #31 from Kazumoto Kojima  2012-03-09 
10:36:31 UTC ---
Created attachment 26859
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26859
A test result

testresult on sh4-unknown-linux-gnu [trunk revision 185088].

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-09 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #30 from Oleg Endo  2012-03-09 
10:02:25 UTC ---
(In reply to comment #29)
> (In reply to comment #28)
> Regtest on sh4-unknown-lunix-gnu has been done successfully.
> Oleg, your patch is pre-approved.

Thanks a lot!
Could you please attach the testsuite summary of your setup?  I'd like to
compare them to my results (in particular the libstdc++ tests).
I'm now getting similar effects as in #comment 9 again, where a bunch of
libstdc++ failures disappear and this time one new failure appears:
FAIL: 21_strings/basic_string/cons/char/6.cc execution test

This is weird...

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-09 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #29 from Kazumoto Kojima  2012-03-09 
08:40:32 UTC ---
(In reply to comment #28)
Regtest on sh4-unknown-lunix-gnu has been done successfully.
Oleg, your patch is pre-approved.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-08 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #28 from Kazumoto Kojima  2012-03-09 
01:44:52 UTC ---
(In reply to comment #27)
> Created attachment 26858 [details]
> Patch for the patch

Looks all fortran regressions gone away.  I'll run full tests
on sh4-unknown-lunix-gnu.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-08 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

Oleg Endo  changed:

   What|Removed |Added

  Attachment #26853|0   |1
is obsolete||

--- Comment #27 from Oleg Endo  2012-03-09 
00:26:39 UTC ---
Created attachment 26858
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26858
Patch for the patch


> Old tests that failed, that have disappeared: (Eeek!)
>
> 22_locale/ctype/is/char/3.cc execution test
> 27_io/basic_filebuf/underflow/wchar_t/9178.cc execution test
> gfortran.dg/widechar_intrinsics_6.f90  -Os  execution test

That was a feature ;)

> I've attached .s files against gfortran.dg/associated_4.f90 -O1 with
> patched/unpatched compilers.

I'm sorry, I got the definition of the negc opcode wrong in the movrt_negc
pattern.  negc leaves the T bit always at '1' in this particular case, instead
of inverting the T bit.  It is funny that in C/C++ code it was never actually
trying to re-use the T bit after the negc, but in Fortran it did.  And that's
what went wrong.

I'm now testing the attached patch for C/C++ ...

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-08 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #26 from Kazumoto Kojima  2012-03-08 
11:16:39 UTC ---
Created attachment 26855
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26855
unworked .s file associated_4_bad.s

I've attached .s files against gfortran.dg/associated_4.f90 -O1 with
patched/unpatched compilers.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-08 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #25 from Kazumoto Kojima  2012-03-08 
11:13:39 UTC ---
Created attachment 26854
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26854
worked .s file associated_4_good.s

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-08 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #24 from Kazumoto Kojima  2012-03-08 
11:11:32 UTC ---
(In reply to comment #23)
> Kaz, if you have some time, could you try it out in your setup, too please?

On trunk revision 185088, for sh4-unknown-linux-gnu, the result of
compare_tests is:

New tests that FAIL:

gfortran.dg/associated_4.f90  -O1  execution test
gfortran.dg/forall_4.f90  -O3 -fomit-frame-pointer  execution test
gfortran.dg/forall_4.f90  -O3 -fomit-frame-pointer -funroll-all-loops
-finline-functions  execution test
gfortran.dg/forall_4.f90  -O3 -fomit-frame-pointer -funroll-loops  execution
test
gfortran.dg/forall_4.f90  -O3 -g  execution test

Old tests that failed, that have disappeared: (Eeek!)

22_locale/ctype/is/char/3.cc execution test
27_io/basic_filebuf/underflow/wchar_t/9178.cc execution test
gfortran.dg/widechar_intrinsics_6.f90  -Os  execution test

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-07 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #23 from Oleg Endo  2012-03-08 
01:25:21 UTC ---
Created attachment 26853
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26853
Patch for the patch

The attached patch seems to fix the problem.
GCC (C,C++) and CSiBE set compiles with it.  Now doing the full testsuite...

Kaz, if you have some time, could you try it out in your setup, too please?

A thing that bugs me regarding the attached patch is the big/little endian
subreg copy pasta in the patterns *negnegt, *movtt, *movt_qi.  Isn't there a
way to avoid that?

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #22 from Oleg Endo  2012-03-06 
23:42:15 UTC ---
This is a reduced test case:

int test (volatile int* a, int b, int c)
{
  a[1] = b != 0;

  if (b == 0)
a[10] = c;

  return b == 0;
}

with '-O2 -m4-single -mb' it gets compiled to:

tst r5,r5   ! b == 0 -> T
mov #-1,r1
negcr1,r1   ! b != 0 -> T, r1
mov.l   r1,@(4,r4)
bf  .L2 ! branch if (b == 0)
mov.l   r6,@(40,r4)
.L2:
tst r5,r5
rts
movtr0

This is because in the 'movnegt' expander it is not mentioned that the T bit is
modified and the first CSE pass optimizes away the 'b == 0' test before the
branch.  I'm trying to come up with some alternative approaches...

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #21 from Oleg Endo  2012-03-06 
11:29:17 UTC ---
(In reply to comment #20)

> I've confirmed that 22_locale/ctype/is/char/3.cc doesn't fail
> if linking with libstdc++.a which is built with the compiler
> without r184966 changes. The .s files against 3.cc are same with
> the both compilers.  It looks that the problematic object is
> libstdc++-v3/src/c++98/ctype_configure_char.o because the error
> went away if replacing it with another one.  I've attached .i and
> .s files for that file.  The option used is [...]

Cool.  Thanks a lot!  I think I know what the problem is now.  Looking into
it...

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #20 from Kazumoto Kojima  2012-03-06 
10:40:31 UTC ---
(In reply to comment #16)
> Can we keep the r184966 changes anyways?  I will keep an eye on these failures
> whether I can reproduce them.  If you have some time, could you please send me
> the intermediate .i and .s files of the failing and passing version of the
> '22_locale/ctype/is/char/3.cc' test case?

I've confirmed that 22_locale/ctype/is/char/3.cc doesn't fail
if linking with libstdc++.a which is built with the compiler
without r184966 changes. The .s files against 3.cc are same with
the both compilers.  It looks that the problematic object is
libstdc++-v3/src/c++98/ctype_configure_char.o because the error
went away if replacing it with another one.  I've attached .i and
.s files for that file.  The option used is

COLLECT_GCC_OPTIONS='-shared-libgcc' '-B' '/exp/ldroot/dodes/xsh-gcc/./gcc'
'-nostdinc++'
'-L/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/src'
'-L/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/src/.libs'
'-B' '/usr/local/sh4-unknown-linux-gnu/bin/' '-B'
'/usr/local/sh4-unknown-linux-gnu/lib/' '-isystem'
'/usr/local/sh4-unknown-linux-gnu/include' '-isystem'
'/usr/local/sh4-unknown-linux-gnu/sys-include' '-I'
'/exp/ldroot/dodes/ORIG/trunk/libstdc++-v3/../libgcc' '-I'
'/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/include/sh4-unknown-linux-gnu'
'-I'
'/exp/ldroot/dodes/xsh-gcc-orig/sh4-unknown-linux-gnu/libstdc++-v3/include'
'-I' '/exp/ldroot/dodes/ORIG/trunk/libstdc++-v3/libsupc++'
'-fno-implicit-templates' '-Wall' '-Wextra' '-Wwrite-strings' '-Wcast-qual'
'-Wabi' '-fdiagnostics-show-location=once' '-ffunction-sections'
'-fdata-sections' '-frandom-seed=ctype_configure_char.lo' '-g' '-O2' '-D'
'_GNU_SOURCE' '-S' '-fPIC' '-D' 'PIC' '-o'

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #19 from Kazumoto Kojima  2012-03-06 
10:38:22 UTC ---
Created attachment 26839
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26839
unworked .s file ctype_configure_char_bad.s

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #18 from Kazumoto Kojima  2012-03-06 
10:37:13 UTC ---
Created attachment 26838
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26838
worked .s file ctype_configure_char_good.s

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #17 from Kazumoto Kojima  2012-03-06 
10:36:01 UTC ---
Created attachment 26837
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26837
preprocessed file ctype_configure_char.i

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #16 from Oleg Endo  2012-03-06 
09:48:31 UTC ---
(In reply to comment #15)
> I've seen same failures on sh4-unknown-linux-gnu for trunk rev 184971.
> With backing r184966 changes out, they went away.  Weird.

Can we keep the r184966 changes anyways?  I will keep an eye on these failures
whether I can reproduce them.  If you have some time, could you please send me
the intermediate .i and .s files of the failing and passing version of the
'22_locale/ctype/is/char/3.cc' test case?

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #15 from Kazumoto Kojima  2012-03-06 
08:49:27 UTC ---
(In reply to comment #14)
> I've run the testsuite on rev 184966 (without fortran though), but the 
> failures
> that you've mentioned did not show up.  Usually when I rebuild the whole
> toolchain including newlib, I have C/CPP/CXXFLAGS_FOR_TARGET set to '-Os
> -mpretend-cmove'.  This time I removed those, but the results seem to be the
> same.  Could you also please try again?  This is suspicious...

I've seen same failures on sh4-unknown-linux-gnu for trunk rev 184971.
With backing r184966 changes out, they went away.  Weird.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-06 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #14 from Oleg Endo  2012-03-06 
08:26:06 UTC ---
(In reply to comment #13)
> On Tue, 2012-03-06 at 08:13 +0900, Kaz Kojima wrote:
> 
> > I've tested your latest patch on sh4-unknown-linux-gnu with trunk
> > revision 184872.  It looks that some new failures are poping up:
> > 
> > New tests that FAIL:
> > 
> > 22_locale/ctype/is/char/3.cc execution test
> > 27_io/basic_filebuf/underflow/wchar_t/9178.cc execution test
> > gfortran.dg/widechar_intrinsics_6.f90  -Os  execution test
> > 
> > Pehaps failures might be ones you've suggested in #10 in
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244
> > 
> > Could you double check?
> 
> Of course!  Doing so now...

I've run the testsuite on rev 184966 (without fortran though), but the failures
that you've mentioned did not show up.  Usually when I rebuild the whole
toolchain including newlib, I have C/CPP/CXXFLAGS_FOR_TARGET set to '-Os
-mpretend-cmove'.  This time I removed those, but the results seem to be the
same.  Could you also please try again?  This is suspicious...

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-05 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #13 from Oleg Endo  2012-03-05 
23:37:35 UTC ---
On Tue, 2012-03-06 at 08:13 +0900, Kaz Kojima wrote:

> I've tested your latest patch on sh4-unknown-linux-gnu with trunk
> revision 184872.  It looks that some new failures are poping up:
> 
> New tests that FAIL:
> 
> 22_locale/ctype/is/char/3.cc execution test
> 27_io/basic_filebuf/underflow/wchar_t/9178.cc execution test
> gfortran.dg/widechar_intrinsics_6.f90  -Os  execution test
> 
> Pehaps failures might be ones you've suggested in #10 in
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244
> 
> Could you double check?

Of course!  Doing so now...

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-05 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #12 from Oleg Endo  2012-03-05 
23:12:27 UTC ---
Author: olegendo
Date: Mon Mar  5 23:12:20 2012
New Revision: 184966

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184966
Log:
PR target/51244
* config/sh/sh.c (sh_expand_t_scc): Remove SH2A special case
and use unified expansion logic.
* config/sh/sh.md (xorsi3_movrt): Rename to movrt.  Move
closer to the existing movt insn.
(negc): Rename insn to *negc.  Add new expander.
(movnegt): Use xor pattern for T bit negation.  Reserve helper
constant for negc pattern.
(*movnegt): New insn and splitter.

PR target/51244
* gcc.target/sh/pr51244-1.c: New.
* gcc.target/sh/pr51244-2.c: New.
* gcc.target/sh/pr51244-3.c: New.


Added:
trunk/gcc/testsuite/gcc.target/sh/pr48596.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-1.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-2.c
trunk/gcc/testsuite/gcc.target/sh/pr51244-3.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-04 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

Oleg Endo  changed:

   What|Removed |Added

  Attachment #26812|0   |1
is obsolete||

--- Comment #11 from Oleg Endo  2012-03-04 
17:24:44 UTC ---
Created attachment 26822
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26822
Proposed patch

This patch should be better now.
However, I'm not sure how well this will work with SH64 due to the (arbitrary)
TARGET_SH1 conditions in the insns.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-03 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #10 from Oleg Endo  2012-03-03 
12:32:29 UTC ---
(In reply to comment #9)
> Created attachment 26812 [details]
> Proposed patch
> 
> I've tested this patch again against rev 184764 (GCC 4.7) with
> 
> make -k check RUNTESTFLAGS="--target_board=sh-sim\{
> -m2/-ml,-m2/-mb,-m2a-single/-mb,-m4-single/-ml,
> -m4-single/-mb,-m4a-single/-ml,-m4a-single/-mb}"
> 
> Surprisingly, it fixes the following libstdc++ tests.
> 

That was a false alarm.  I've messed up the test results somehow.
The libstdc++ test case fixes have nothing to do with the patch, but rather
rev 184764 vs. rev 184829.  Sorry for any confusion.

> 
> However, it also introduces two new of new failures.
> 
> For all sub targets:
> FAIL: 21_strings/basic_string/cons/char/6.cc execution test
> 
> For -m4a-single and -m4-single (-ml and -mb):
> FAIL: 22_locale/ctype/is/char/3.cc execution test
> 
> I'm looking into what is happening in the two cases.

It seems that when building newlib something gets messed up related to delayed
branches.  Building newlib with -fno-delayed-branch seems to make the failures
go away.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-03-02 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

Oleg Endo  changed:

   What|Removed |Added

  Attachment #26191|0   |1
is obsolete||

--- Comment #9 from Oleg Endo  2012-03-02 21:56:38 
UTC ---
Created attachment 26812
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26812
Proposed patch

I've tested this patch again against rev 184764 (GCC 4.7) with

make -k check RUNTESTFLAGS="--target_board=sh-sim\{
-m2/-ml,-m2/-mb,-m2a-single/-mb,-m4-single/-ml,
-m4-single/-mb,-m4a-single/-ml,-m4a-single/-mb}"

Surprisingly, it fixes the following libstdc++ tests.

For all sub targets:
23_containers/forward_list/requirements/exception/basic.cc
23_containers/forward_list/requirements/exception/propagation_consistent.cc
23_containers/list/requirements/exception/basic.cc
23_containers/list/requirements/exception/propagation_consistent.cc
23_containers/multiset/requirements/exception/basic.cc
23_containers/multiset/requirements/exception/propagation_consistent.cc
23_containers/unordered_map/requirements/exception/propagation_consistent.cc
23_containers/unordered_multimap/requirements/exception/basic.cc
23_containers/unordered_multiset/requirements/exception/basic.cc
23_containers/unordered_multiset/requirements/exception/propagation_consistent.cc
23_containers/unordered_set/requirements/exception/propagation_consistent.cc
ext/pb_ds/regression/list_update_map_rand.cc
ext/pb_ds/regression/list_update_set_rand.cc

For -m4a-single and -m4-single (-ml and -mb):
23_containers/forward_list/requirements/exception/basic.cc
23_containers/forward_list/requirements/exception/propagation_consistent.cc
23_containers/list/requirements/exception/basic.cc
23_containers/list/requirements/exception/propagation_consistent.cc
23_containers/multiset/requirements/exception/basic.cc
23_containers/multiset/requirements/exception/propagation_consistent.cc

However, it also introduces two new of new failures.

For all sub targets:
FAIL: 21_strings/basic_string/cons/char/6.cc execution test

For -m4a-single and -m4-single (-ml and -mb):
FAIL: 22_locale/ctype/is/char/3.cc execution test

I'm looking into what is happening in the two cases.

[Bug target/51244] SH Target: Inefficient conditional branch

2012-02-26 Thread olegendo at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

olegendo at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2012-02-26
 CC||olegendo at gcc dot gnu.org
 AssignedTo|unassigned at gcc dot   |olegendo at gcc dot gnu.org
   |gnu.org |
 Ever Confirmed|0   |1

[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-30 Thread oleg.e...@t-online.de

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #8 from Oleg Endo  2011-12-30 21:21:14 UTC 
---
(In reply to comment #7)
> (In reply to comment #3)
> > I haven't ran all tests on it yet, but CSiBE shows average code size 
> > reduction
> > of approx. -0.1% for -m4* with some code size increases in some files.
> > Would something like that be OK for stage 3?
> 
> Looks good, though not appropriate for stage 3, I think.

The patch passed the testsuite without new failures.  I'll queue it up for
stage 1.

[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-28 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #7 from Kazumoto Kojima  2011-12-28 
22:25:48 UTC ---
(In reply to comment #3)
> I haven't ran all tests on it yet, but CSiBE shows average code size reduction
> of approx. -0.1% for -m4* with some code size increases in some files.
> Would something like that be OK for stage 3?

Looks good, though not appropriate for stage 3, I think.

[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-28 Thread oleg.e...@t-online.de

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #6 from Oleg Endo  2011-12-28 15:59:35 UTC 
---
(In reply to comment #3)
> Created attachment 26191 [details]
> Proposed patch to improve some of the issues.
> 
> The attached patch removes the useless sequence and still allows the -1
> constant to be CSE-ed for such cases as the example function above.
> 
> I haven't ran all tests on it yet, but CSiBE shows average code size reduction
> of approx. -0.1% for -m4* with some code size increases in some files.

Some of the code size increases are caused by the ifcvt.c pass which tries to
transform sequences like:

int test_func_6 (int a, int b, int c)
{
  if (a == 16)
c = 0;
  return b + c;
}

into branch-free code like:
mov r4,r0   ! 45movsi_ie/2[length = 2]
cmp/eq  #16,r0  ! 9 cmpeqsi_t/2[length = 2]
mov #-1,r0  ! 34movsi_ie/3[length = 2]
negcr0,r0   ! 38*negc[length = 2]
neg r0,r0   ! 36negsi2[length = 2]
and r6,r0   ! 37*andsi3_compact/2[length = 2]
rts ! 48*return_i[length = 2]
add r5,r0   ! 14*addsi3_compact[length = 2]

instead of the more compact (and on SH4 most likely better):
movr4,r0   ! 41movsi_ie/2[length = 2]
cmp/eq#16,r0  ! 9cmpeqsi_t/2[length = 2]
bf0f  ! 34*movsicc_t_true/2[length = 4]
mov#0,r6
0:
addr5,r6   ! 14*addsi3_compact[length = 2]
rts ! 44*return_i[length = 2]
movr6,r0   ! 19movsi_ie/2[length = 2]

This particular case is handled in noce_try_store_flag_mask, which does the
transformation if BRANCH_COST >= 2, which is true for -m4.  I guess before the
patch ifcvt didn't realize that this transformation can be applied.

I've tried setting BRANCH_COST to 1, which avoids this transformation but
increases overall code size a bit.

[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-27 Thread oleg.e...@t-online.de

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #5 from Oleg Endo  2011-12-28 02:44:05 UTC 
---
(In reply to comment #2)
> (In reply to comment #1)
> > 
> > BTW, OT, (a != b || a != c) ? b : c could be reduced to b, I think.
> > 
> 
> Yes, very much so.
> It is reduced to "return b" for -m2, -m2e, -m2a, -m3, -m3e
> but not for -m1 and -m4*.

This seems to be due to the following in sh.h:

#define BRANCH_COST(speed_p, predictable_p) \
(TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1)

[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-27 Thread oleg.e...@t-online.de

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #4 from Oleg Endo  2011-12-27 23:17:03 UTC 
---
(In reply to comment #1)
> 
> >  return a >= 0 && b >= 0 ? c : d;
> 
> x >= 0 is expanded to the sequence like
> 
>   ra = not x
>   rb = -31
>   rc = ra >> (neg rb)
>   T = (rc == 0)
>   conditional jump
> 
> and combine tries to simplify it.  combine simplifies b >= 0
> successfully into shll and bt but fails to simplify a >= 0.
> It seems that combine doesn't do constant propagation well and
> misses the constant -31.

Another simpler fail:

int test_func_22_NG (int a, int b, int c, int d)
{
  return a >= 0;
}

becomes:
not r4,r0   ! 9one_cmplsi2[length = 2]
mov #-31,r1 ! 12movsi_ie/3[length = 2]
rts ! 31*return_i[length = 2]
shldr1,r0   ! 13lshrsi3_d[length = 2]

which could be:
cmp/pzr4
rts
movtr0

>From what I could observe, this is caused by the various shift insns which
leads combine to this result.  For example, the shll, branch sequence that
is used instead of cmp/pz, branch is caused by the ashlsi_c insn, which
defines a lt:SI comparison.  Although that is correct, using cmp/pz could
be better, since it does not modify the reg, and on SH4 it is an MT group
insn.  The ashlsi_c insn / lt:SI picking can be avoided by adjusting the 
rtx costs, for instance (just tried it out briefly).

I think a peephole in this case could fix some of the symptoms but not the
actual cause.  I'll see if I can come up with something that works without a
peephole, even though all the shift stuff looks a bit suspicious ;)

[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-27 Thread oleg.e...@t-online.de

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #3 from Oleg Endo  2011-12-27 22:43:11 UTC 
---
Created attachment 26191
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26191
Proposed patch to improve some of the issues.

(In reply to comment #1)
> 
> [...]
> 
>   mov #-1,rn
>   negc rn,rm
>   tst #255,rm
> 
> which is essentially T_reg = T_reg.  Usually combine catches such
> situation, but negc might be too complex for combine.
> For this case, replacing current movnegt expander by insn, splitter
> and peephole something like
> 
> [...]
>
> the above useless sequence could be removed, though we will miss
> the chance that the -1 can be CSE-ed when the cstore value is
> used.  This will cause a bit worse code for the loop like
> 
> int
> foo (int *a, int x, int n)
> {
>   int i;
>   int count;
> 
>   for (i = 0; i < n; i++)
> count += (*(a + i) != x);
> 
>   return count;
> }
> 

Thanks for your ideas and comments.  It was really useful.

The attached patch removes the useless sequence and still allows the -1
constant to be CSE-ed for such cases as the example function above.

I haven't ran all tests on it yet, but CSiBE shows average code size reduction
of approx. -0.1% for -m4* with some code size increases in some files.
Would something like that be OK for stage 3?

[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-27 Thread oleg.e...@t-online.de

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #2 from Oleg Endo  2011-12-27 21:26:33 UTC 
---
(In reply to comment #1)
> 
> BTW, OT, (a != b || a != c) ? b : c could be reduced to b, I think.
> 

Yes, very much so.
It is reduced to "return b" for -m2, -m2e, -m2a, -m3, -m3e
but not for -m1 and -m4*.

The correct test function should be rather:

int test_func_0_NG (int a, int b, int c, int d)
{
  return (a != b || a != d) ? b : c;
}

which is actually OK for all variants except -m1 and -m4*:

cmp/eqr5,r4! 11cmpeqsi_t/3[length = 2]
bf.s.L6! 12branch_false[length = 2]
cmp/eqr7,r5! 14cmpeqsi_t/3[length = 2]
bf.L6! 15branch_false[length = 2]
movr6,r5! 8movsi_i/2[length = 2]
.L6:
rts! 42*return_i[length = 2]
movr5,r0! 23movsi_i/2[length = 2]

[Bug target/51244] SH Target: Inefficient conditional branch

2011-11-22 Thread kkojima at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #1 from Kazumoto Kojima  2011-11-22 
22:33:43 UTC ---
>  return (a != b || a != c) ? b : c;

test_func_0_NG and test_func_1_NG cases are related with the target
implementation of cstoresi4.
The middle end expands a complex conditional jump to cstores and
a simple conditional jumps.  For expression a != b, SH's cstoresi4
implementation uses sh.c:sh_emit_compare_and_set which generates
cmp/eq and movnegt insn, because we have no cmp/ne insn.  Then we've
got the sequence

  mov #-1,rn
  negc rn,rm
  tst #255,rm

which is essentially T_reg = T_reg.  Usually combine catches such
situation, but negc might be too complex for combine.
For this case, replacing current movnegt expander by insn, splitter
and peephole something like

(define_insn "movnegt"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
(plus:SI (reg:SI T_REG) (const_int -1)))
   (clobber (match_scratch:SI 1 "=&r"))
   (clobber (reg:SI T_REG))]
  ""
  "#"
 [(set_attr "length" "4")])

(define_split
  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
(plus:SI (reg:SI T_REG) (const_int -1)))
   (clobber (match_scratch:SI 1 "=&r"))
   (clobber (reg:SI T_REG))]
  "reload_completed"
  [(set (match_dup 1) (const_int -1))
   (parallel [(set (match_dup 0)
   (neg:SI (plus:SI (reg:SI T_REG)
(match_dup 1
  (set (reg:SI T_REG)
   (ne:SI (ior:SI (reg:SI T_REG) (match_dup 1))
  (const_int 0)))])]
  "")

(define_peephole2
  [(set (match_operand:SI 1 "" "") (const_int -1))
   (parallel [(set (match_operand:SI 0 "" "")
   (neg:SI (plus:SI (reg:SI T_REG)
(match_dup 1
  (set (reg:SI T_REG)
   (ne:SI (ior:SI (reg:SI T_REG) (match_dup 1))
  (const_int 0)))])
   (set (reg:SI T_REG)
(eq:SI (match_operand:QI 3 "" "") (const_int 0)))]
  "REGNO (operands[3]) == REGNO (operands[0])
   && peep2_reg_dead_p (3, operands[0])
   && peep2_reg_dead_p (3, operands[1])"
  [(const_int 0)]
  "")

the above useless sequence could be removed, though we will miss
the chance that the -1 can be CSE-ed when the cstore value is
used.  This will cause a bit worse code for the loop like

int
foo (int *a, int x, int n)
{
  int i;
  int count;

  for (i = 0; i < n; i++)
count += (*(a + i) != x);

  return count;
}

though it may be relatively rare.

BTW, OT, (a != b || a != c) ? b : c could be reduced to b, I think.

>  return a >= 0 && b >= 0 ? c : d;

x >= 0 is expanded to the sequence like

  ra = not x
  rb = -31
  rc = ra >> (neg rb)
  T = (rc == 0)
  conditional jump

and combine tries to simplify it.  combine simplifies b >= 0
successfully into shll and bt but fails to simplify a >= 0.
It seems that combine doesn't do constant propagation well and
misses the constant -31.  In this case, a peephole like

(define_peephole2
  [(set (match_operand:SI 0 "arith_reg_dest" "")
(not:SI (match_operand:SI 1 "arith_reg_operand" "")))
   (set (match_operand:SI 2 "arith_reg_dest" "") (const_int -31))
   (set (match_operand:SI 3 "arith_reg_dest" "")
(lshiftrt:SI (match_dup 0) (neg:SI (match_dup 2
   (set (reg:SI T_REG)
(eq:SI (match_operand:QI 4 "arith_reg_operand" "")
   (const_int 0)))
   (set (pc)
(if_then_else (match_operator 5 "comparison_operator"
[(reg:SI T_REG) (const_int 0)])
  (label_ref (match_operand 6 "" ""))
  (pc)))]
  "REGNO (operands[3]) == REGNO (operands[4])
   && peep2_reg_dead_p (4, operands[0])
   && (peep2_reg_dead_p (4, operands[3])
   || rtx_equal_p (operands[2], operands[3]))
   && peep2_regno_dead_p (5, T_REG)"
  [(set (match_dup 2) (const_int -31))
   (set (reg:SI T_REG) (ge:SI (match_dup 1) (const_int 0)))
   (set (pc)
(if_then_else (match_op_dup 7 [(reg:SI T_REG) (const_int 0)])
  (label_ref (match_dup 6))
  (pc)))]
  "
{
  operands[7] = gen_rtx_fmt_ee (reverse_condition (GET_CODE (operands[5])),
GET_MODE (operands[5]),
XEXP (operands[5], 0), XEXP (operands[5], 1));
}")

will be a workaround.  It isn't ideal, but better than nothing.

>  return a == b ? test_sub0 (a, b) : test_sub1 (a, b);
>  return a != b ? test_sub0 (a, b) : test_sub1 (a, b);

This case is intresting.  At -Os, two calls are converted into
one computed goto.  A bit surprisingly, the conversion is done
as a side effect of combine-stack-adjustments pass.  That pass
calls

  cleanup_cfg (flag_crossjumping ? CLEANUP_CROSSJUMP : 0);

and the cross jumping optimization merges two calls.
With -Os -fno-delayed-branch, the OK case is compiled to

test_func_3_OK:
mov r4,r1
cmp/eq  r5,r1
mov.l   .L4,r0
bf  .L3
mov r1,r5
mov.l   .L5,r0
bra .L3
nop
.L3:
jmp @r0
nop

and the NG case

test_func_3_NG:
mov r4,r1
cmp/eq  r5,r1
bt  .L2
mov.l   .L4,r0
bra .L

53 matches

Mail list logo