Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-07 Thread Kaz Kojima
Oleg Endo  wrote:
> To be honest, I had some difficulty picking the name.
> Maybe something like 'sh_tbit_combine' or 'sh_treg_combine' would be
> better, or at least less confusing?  Suggestions are highly appreciated.

'sh_treg_combine' or 'sh_combine_treg' sounds good to me.

Regards,
kaz


Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-07 Thread Oleg Endo
On Mon, 2013-10-07 at 07:44 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > Forgot to handle a case in function can_remove_cstore, thanks for
> > catching it.  Fixed in the attached patch and also added test cases.
> > Retested as before without new failures.
> 
> Ok for trunk.
> 
> > Yeah, right.  I've changed 'ifcvt_sh' to 'sh_ifcvt'.
> 
> >+  register_pass (make_pass_sh_ifcvt (g, false, "ifcvt1_sh"),
> >+ PASS_POS_INSERT_AFTER, "ce1", 1);
> >+*/
> 
> s/ifcvt1_sh/sh_ifcvt1/ might be better even in a comment.

Sorry, I missed one.  Will fix and resend the committed patch after we
have agreed on the pass name (see Christian's message).

Cheers,
Oleg



Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-07 Thread Oleg Endo
On Mon, 2013-10-07 at 10:30 +0200, Christian Bruel wrote:
> Hi Oleg,
> 
> +/*
> +This pass tries to optimize for example this:
> + mov.l   @(4,r4),r1
> + tst r1,r1
> + movtr1
> + tst r1,r1
> + bt/s.L5
> +
> +into something simpler:
> + mov.l   @(4,r4),r1
> + tst r1,r1
> + bf/s.L5
> +
> +Such sequences can be identified by looking for conditional branches and
> +checking whether the ccreg is set before the conditional branch
> +by testing another register for != 0, which was set by a ccreg store.
> +This can be optimized by eliminating the redundant comparison and
> +inverting the branch condition.  There can be multiple comparisons in
> +different basic blocks that all end up in the redunant test insn before the
> +conditional branch.  Some example RTL ...
> +
> 
> Nice things to optimize the sequences when t-bit values are not
> recognized due to branches direction, I have 2 questions
> 
> 1) I find the name "if-conversion" for this pass a little bit forced,
> since you don't aim to remove branches. If looks more like some kind of
> extension value numbering. 

To be honest, I had some difficulty picking the name.
Maybe something like 'sh_tbit_combine' or 'sh_treg_combine' would be
better, or at least less confusing?  Suggestions are highly appreciated.

> 2) I'm wondering in which extend this case could be handled by a more
> global generic target value numbering to handle boolean operations.
> Maybe just a phasing problem as the branch directions are not yet
> computed in gimple-ssa, which would mean reworking in RTL ?

I don't know.  What the pass currently does looks like highly
specialized combine (try_eliminate_cstores) and a little bit of CSE
(try_combine_comparisons), I think.
The problems on SH are due to the way insns are expanded and the T bit
handling (SImode hardreg, etc), I guess.  Moreover, some of the SH insns
might split out cstores and inverted cstores after combine, which then
would result in those sequences.  Combine itself sometimes can catch
some of the cases, but not all, or at least not with a finite amount of
combine patterns ;)

Cheers,
Oleg



Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-07 Thread Christian Bruel
Hi Oleg,

+/*
+This pass tries to optimize for example this:
+   mov.l   @(4,r4),r1
+   tst r1,r1
+   movtr1
+   tst r1,r1
+   bt/s.L5
+
+into something simpler:
+   mov.l   @(4,r4),r1
+   tst r1,r1
+   bf/s.L5
+
+Such sequences can be identified by looking for conditional branches and
+checking whether the ccreg is set before the conditional branch
+by testing another register for != 0, which was set by a ccreg store.
+This can be optimized by eliminating the redundant comparison and
+inverting the branch condition.  There can be multiple comparisons in
+different basic blocks that all end up in the redunant test insn before the
+conditional branch.  Some example RTL ...
+

Nice things to optimize the sequences when t-bit values are not recognized due 
to branches direction, I have 2 questions

1) I find the name "if-conversion" for this pass a little bit forced, since you 
don't aim to remove branches. If looks more like some kind of extension value 
numbering. 

2) I'm wondering in which extend this case could be handled by a more global 
generic target value numbering to handle boolean operations. Maybe just a 
phasing problem as the branch directions are not yet computed in gimple-ssa, 
which would mean reworking in RTL ?

Many thanks,

Christian 



Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-06 Thread Kaz Kojima
Oleg Endo  wrote:
> Forgot to handle a case in function can_remove_cstore, thanks for
> catching it.  Fixed in the attached patch and also added test cases.
> Retested as before without new failures.

Ok for trunk.

> Yeah, right.  I've changed 'ifcvt_sh' to 'sh_ifcvt'.

>+  register_pass (make_pass_sh_ifcvt (g, false, "ifcvt1_sh"),
>+   PASS_POS_INSERT_AFTER, "ce1", 1);
>+*/

s/ifcvt1_sh/sh_ifcvt1/ might be better even in a comment.

Regards,
kaz


Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-04 Thread Kaz Kojima
Oleg Endo  wrote:
> Some of the things I've done in 4.8 to improve SH T bit handling turned
> out to produce wrong code.  The attached patch fixes that by introducing
> an SH specific RTL pass.
> 
> Tested on rev 202876 with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> and no new failures.
> Additional test cases will follow.
> OK for trunk?

I've got a new failure with the patch on sh4-unknown-linux-gnu:

New tests that FAIL:

gfortran.fortran-torture/execute/forall_7.f90 execution,  -O3 -g 

> ifcvt_sh

We usually use sh_/sh- prefix for target specific things, don't we?
I'm not sure that there is some rigid naming convention for them,
though.

Regards,
kaz


[SH] PR 51244 - Fix defects introduced in 4.8

2013-10-04 Thread Oleg Endo
Hello,

Some of the things I've done in 4.8 to improve SH T bit handling turned
out to produce wrong code.  The attached patch fixes that by introducing
an SH specific RTL pass.

Tested on rev 202876 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
Additional test cases will follow.
OK for trunk?

Cheers,
Oleg

gcc/ChangeLog:

PR target/51244
* config/sh/ifcvt_sh.cc: New SH specific RTL pass.
* config.gcc (SH extra_objs): Add ifcvt_sh.o.
* config/sh/t-sh (ifcvt_sh.o): New entry.
* config/sh/sh.c (sh_fixed_condition_code_regs): New function 
that implements the target hook TARGET_FIXED_CONDITION_CODE_REGS.
(register_sh_passes): New function.  Register ifcvt_sh pass.
(sh_option_override): Invoke it.
(sh_canonicalize_comparison): Handle op0_preserve_value.
* sh.md (*cbranch_t"): Do not try to optimize missed test and 
branch opportunities.  Canonicalize branch condition.
(nott): Allow only if pseudos can be created for non-SH2A.
Index: gcc/config.gcc
===
--- gcc/config.gcc	(revision 202876)
+++ gcc/config.gcc	(working copy)
@@ -462,6 +462,7 @@
 	cpu_type=sh
 	need_64bit_hwint=yes
 	extra_options="${extra_options} fused-madd.opt"
+	extra_objs="${extra_objs} ifcvt_sh.o"
 	;;
 v850*-*-*)
 	cpu_type=v850
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 202876)
+++ gcc/config/sh/sh.c	(working copy)
@@ -53,6 +53,9 @@
 #include "alloc-pool.h"
 #include "tm-constrs.h"
 #include "opts.h"
+#include "tree-pass.h"
+#include "pass_manager.h"
+#include "context.h"
 
 #include 
 #include 
@@ -311,6 +314,7 @@
 static void sh_canonicalize_comparison (int *, rtx *, rtx *, bool);
 static void sh_canonicalize_comparison (enum rtx_code&, rtx&, rtx&,
 	enum machine_mode, bool);
+static bool sh_fixed_condition_code_regs (unsigned int* p1, unsigned int* p2);
 
 static void sh_init_sync_libfuncs (void) ATTRIBUTE_UNUSED;
 
@@ -587,6 +591,9 @@
 #undef TARGET_CANONICALIZE_COMPARISON
 #define TARGET_CANONICALIZE_COMPARISON	sh_canonicalize_comparison
 
+#undef TARGET_FIXED_CONDITION_CODE_REGS
+#define TARGET_FIXED_CONDITION_CODE_REGS sh_fixed_condition_code_regs
+
 /* Machine-specific symbol_ref flags.  */
 #define SYMBOL_FLAG_FUNCVEC_FUNCTION	(SYMBOL_FLAG_MACH_DEP << 0)
 
@@ -710,6 +717,34 @@
 #undef err_ret
 }
 
+/* Register SH specific RTL passes.  */
+extern opt_pass* make_pass_ifcvt_sh (gcc::context* ctx, bool split_insns,
+ const char* name);
+static void
+register_sh_passes (void)
+{
+  if (!TARGET_SH1)
+return;
+
+/* Running the ifcvt_sh pass after ce1 generates better code when
+   comparisons are combined and reg-reg moves are introduced, because
+   reg-reg moves will be eliminated afterwards.  However, there are quite
+   some cases where combine will be unable to fold comparison related insns,
+   thus for now don't do it.
+  register_pass (make_pass_ifcvt_sh (g, false, "ifcvt1_sh"),
+		 PASS_POS_INSERT_AFTER, "ce1", 1);
+*/
+
+  /*  Run ifcvt_sh pass after combine but before register allocation.  */
+  register_pass (make_pass_ifcvt_sh (g, true, "ifcvt2_sh"),
+		 PASS_POS_INSERT_AFTER, "split1", 1);
+
+  /* Run ifcvt_sh pass after register allocation and basic block reordering
+ as this sometimes creates new opportunities.  */
+  register_pass (make_pass_ifcvt_sh (g, true, "ifcvt3_sh"),
+		 PASS_POS_INSERT_AFTER, "split4", 1);
+}
+
 /* Implement TARGET_OPTION_OVERRIDE macro.  Validate and override 
various options, and do some machine dependent initialization.  */
 static void
@@ -1022,6 +1057,8 @@
  target CPU.  */
   selected_atomic_model_
 = parse_validate_atomic_model_option (sh_atomic_model_str);
+
+  register_sh_passes ();
 }
 
 /* Print the operand address in x to the stream.  */
@@ -1908,7 +1945,7 @@
 static void
 sh_canonicalize_comparison (enum rtx_code& cmp, rtx& op0, rtx& op1,
 			enum machine_mode mode,
-			bool op0_preserve_value ATTRIBUTE_UNUSED)
+			bool op0_preserve_value)
 {
   /* When invoked from within the combine pass the mode is not specified,
  so try to get it from one of the operands.  */
@@ -1928,6 +1965,9 @@
   // Make sure that the constant operand is the second operand.
   if (CONST_INT_P (op0) && !CONST_INT_P (op1))
 {
+  if (op0_preserve_value)
+	return;
+
   std::swap (op0, op1);
   cmp = swap_condition (cmp);
 }
@@ -2016,6 +2056,14 @@
   *code = (int)tmp_code;
 }
 
+bool
+sh_fixed_condition_code_regs (unsigned int* p1, unsigned int* p2)
+{
+  *p1 = T_REG;
+  *p2 = INVALID_REGNUM;
+  return true;
+}
+
 enum rtx_code
 prepare_cbranch_operands (rtx *operands, enum machine_mode mode,
 			  enum rtx_code comparison)
Index: gcc/config/sh/sh.md
===