Re: [PATCH, rs6000] Fixing PR 67145

2016-03-01 Thread Peter Bergner
On Mon, 2016-02-29 at 20:45 -0600, Segher Boessenkool wrote:
> On Mon, Feb 29, 2016 at 11:11:11AM -0800, Richard Henderson wrote:
> > > Where are these canonicalization rules described?
> > 
> > swap_commutative_operands_p?
> 
> That function never swaps reg+reg, or I don't see it.

commutative_operand_precedence gives registers that hold pointers
a higher precedence than registers that don't hold pointers, so
it can swap reg+reg if the second reg holds a pointer.
We did this as part of PR28690.

Peter





Re: [PATCH, rs6000] Fixing PR 67145

2016-02-29 Thread Segher Boessenkool
On Mon, Feb 29, 2016 at 11:11:11AM -0800, Richard Henderson wrote:
> >>> If you check for fixed registers as well here, does that work for you?
> >>
> >> Maybe.  It prevents canonicalization of reg+fp vs fp+reg, which could well
> >> occur via arithmetic on locally allocated arrays.
> > 
> > Where are these canonicalization rules described?
> 
> swap_commutative_operands_p?

That function never swaps reg+reg, or I don't see it.

> > It is stage 4.  This rs6000 change has almost 100% chance of introducing
> > regressions.
> 
> Really?  Nearly 100%?
> 
> Ignoring the change to subf<>3_carry_in_m1 for a moment, how do any of the
> other changes result in the non-recognition of rtl that was previously valid?
> As far as I can see we only accept more rtl.

It's changing a lot of backend patterns.  There can and will be side
effects.  You're replacing a lot of RTL generation by open-coded stuff,
that's the wrong direction.

You're putting all the risk on a different backend for fixing a minor
regression in x86.


Segher


Re: [PATCH, rs6000] Fixing PR 67145

2016-02-29 Thread Richard Henderson
On 02/26/2016 01:52 PM, Segher Boessenkool wrote:
> On Fri, Feb 26, 2016 at 01:35:10PM -0800, Richard Henderson wrote:
>> On 02/26/2016 01:03 PM, Segher Boessenkool wrote:
>>> On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
 +  /* Perform rematerialization if only all operands are registers and
 + all operations are PLUS.  */
 +  for (i = 0; i < n_ops; i++)
 +  if (ops[i].neg || !REG_P (ops[i].op))
 +return NULL_RTX;
 +  goto gen_result;
 +}
>>>
>>> If you check for fixed registers as well here, does that work for you?
>>
>> Maybe.  It prevents canonicalization of reg+fp vs fp+reg, which could well
>> occur via arithmetic on locally allocated arrays.
> 
> Where are these canonicalization rules described?

swap_commutative_operands_p?


> It is stage 4.  This rs6000 change has almost 100% chance of introducing
> regressions.

Really?  Nearly 100%?

Ignoring the change to subf<>3_carry_in_m1 for a moment, how do any of the
other changes result in the non-recognition of rtl that was previously valid?
As far as I can see we only accept more rtl.


r~


Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Segher Boessenkool
On Fri, Feb 26, 2016 at 01:35:10PM -0800, Richard Henderson wrote:
> On 02/26/2016 01:03 PM, Segher Boessenkool wrote:
> > On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
> >> +  /* Perform rematerialization if only all operands are registers and
> >> + all operations are PLUS.  */
> >> +  for (i = 0; i < n_ops; i++)
> >> +  if (ops[i].neg || !REG_P (ops[i].op))
> >> +return NULL_RTX;
> >> +  goto gen_result;
> >> +}
> > 
> > If you check for fixed registers as well here, does that work for you?
> 
> Maybe.  It prevents canonicalization of reg+fp vs fp+reg, which could well
> occur via arithmetic on locally allocated arrays.

Where are these canonicalization rules described?

> I guess for the purposes of stage4 I'd be willing to do
> 
>   if (ops[i].neg
>   || !REG_P (ops[i].op)
>   || (REGNO (ops[i].op) < FIRST_PSEUDO_REGISTER
>   && fixed_regs[REGNO (ops[i].op)]
>   && !global_regs[REGNO (ops[i].op)]
>   && ops[i].op != frame_pointer_rtx
>   && ops[i].op != arg_pointer_rtx
>   && ops[i].op != stack_pointer_rtx))
> 
> It's pretty ugly though, and I wouldn't want to keep this forever.

Yeah.

> The rs6000 change really ought to be evaluated at some point.  Given its 
> scope,
> I see little difference to doing that now vs putting it off to gcc7.

It is stage 4.  This rs6000 change has almost 100% chance of introducing
regressions.


Segher


Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Segher Boessenkool
On Fri, Feb 26, 2016 at 01:10:17PM -0800, Richard Henderson wrote:
> On 02/26/2016 01:01 PM, Segher Boessenkool wrote:
> >> How do you imagine the rs6000 change will regress codegen?
> > 
> > Combine of sequences with double-length adds.
> 
> What sort of test case are you imagining here?  The trivial tests I've looked
> at have all been optimal (before and after).

There are many different combinations, for example (32-bit code)

long long add_s42(long long a, int b)
{
return a + ((long long) b << 32) + 42;
}

and we generate optimal code on BE for all of those (not on LE yet,
and there is the issue with open-coded carry chains, and and and, and
things shift around.  Also all the scc things.  But I really should
submit some testcases).


Segher


Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Richard Henderson
On 02/26/2016 01:03 PM, Segher Boessenkool wrote:
> On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
>> +  /* Perform rematerialization if only all operands are registers and
>> + all operations are PLUS.  */
>> +  for (i = 0; i < n_ops; i++)
>> +if (ops[i].neg || !REG_P (ops[i].op))
>> +  return NULL_RTX;
>> +  goto gen_result;
>> +}
> 
> If you check for fixed registers as well here, does that work for you?

Maybe.  It prevents canonicalization of reg+fp vs fp+reg, which could well
occur via arithmetic on locally allocated arrays.

I guess for the purposes of stage4 I'd be willing to do

  if (ops[i].neg
  || !REG_P (ops[i].op)
  || (REGNO (ops[i].op) < FIRST_PSEUDO_REGISTER
  && fixed_regs[REGNO (ops[i].op)]
  && !global_regs[REGNO (ops[i].op)]
  && ops[i].op != frame_pointer_rtx
  && ops[i].op != arg_pointer_rtx
  && ops[i].op != stack_pointer_rtx))

It's pretty ugly though, and I wouldn't want to keep this forever.

The rs6000 change really ought to be evaluated at some point.  Given its scope,
I see little difference to doing that now vs putting it off to gcc7.


r~


Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Richard Henderson
On 02/26/2016 01:01 PM, Segher Boessenkool wrote:
>> How do you imagine the rs6000 change will regress codegen?
> 
> Combine of sequences with double-length adds.

What sort of test case are you imagining here?  The trivial tests I've looked
at have all been optimal (before and after).

> It failed more recently though, that's what r215429 is about.  Oh, before
> that it did test just the code, didn't use register_operand.  *facepalm*

Ah yes, exactly.  I didn't think to annotate ca_operand for more recent
changes.  ;-)


r~



Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Segher Boessenkool
On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 450fa8b..9d55e7b 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -4421,9 +4421,17 @@ simplify_plus_minus (enum rtx_code code, machine_mode 
> mode, rtx op0,
>n_ops = i;
>  }
>  
> -  /* If nothing changed, fail.  */
> +  /* If nothing changed, check that rematerialization of rtl instructions
> + is still required.  */
>if (!canonicalized)
> -return NULL_RTX;
> +{
> +  /* Perform rematerialization if only all operands are registers and
> + all operations are PLUS.  */
> +  for (i = 0; i < n_ops; i++)
> + if (ops[i].neg || !REG_P (ops[i].op))
> +   return NULL_RTX;
> +  goto gen_result;
> +}

If you check for fixed registers as well here, does that work for you?


Segher


Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Segher Boessenkool
On Fri, Feb 26, 2016 at 12:51:10PM -0800, Richard Henderson wrote:
> > What is the rs6000 ICE?
> 
> The simplify-rtx.c patch causes (reg:M ca) to get sorted to a different spot 
> in
> the (plus (plus r1 r2) r3) chain than the rs6000 backend expects, producing an
> ICE due to an unrecognizable insn.

So it changes existing RTL to a form that does not pass recog.  Uh.

> How do you imagine the rs6000 change will regress codegen?

Combine of sequences with double-length adds.

> >> ca_operand doesn't work as written, since CA_REGNO is not an available 
> >> register, and thus doesn't satisfy register_operand.
> > 
> > Yes, curious.  But when it was written it *did* match.  Huh.
> 
> It probably started to fail with r181760, back in 2011.  But the predicate
> wasn't much used in the rs6000 backend, so I guess it hadn't really mattered.

It failed more recently though, that's what r215429 is about.  Oh, before
that it did test just the code, didn't use register_operand.  *facepalm*


Segher



Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Richard Henderson
On 02/26/2016 12:41 PM, Segher Boessenkool wrote:
> On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
>> It's the simplify-rtx.c portion of the patch that fixes the i686 regression.
>>
>> In the PR, Alan raises some good points, but I don't believe that we can 
>> address those for gcc6.  A new rtl reassoc optimization that takes loop 
>> invariance into account will have to wait.
>>
>> But we do need to take care of the rs6000 ice that results, and that's the 
>> bulk of the patch -- allowing CA to be sorted to any register of the plus 
>> chain.
> 
> The rs6000 change is much too risky for stage 4 as well in my opinion.
> It also seems it will regress codegen (for which we have no testcases
> yet, sigh).
> 
> What is the rs6000 ICE?

The simplify-rtx.c patch causes (reg:M ca) to get sorted to a different spot in
the (plus (plus r1 r2) r3) chain than the rs6000 backend expects, producing an
ICE due to an unrecognizable insn.

The rs6000 patch changes the backend to allow (reg:M ca) to appear exactly once
in (plus (plus r1 r2) r3).  And we do have a test case for it -- the ICE.

How do you imagine the rs6000 change will regress codegen?

>> ca_operand doesn't work as written, since CA_REGNO is not an available 
>> register, and thus doesn't satisfy register_operand.
> 
> Yes, curious.  But when it was written it *did* match.  Huh.

It probably started to fail with r181760, back in 2011.  But the predicate
wasn't much used in the rs6000 backend, so I guess it hadn't really mattered.

>> Is there any particular reason that subf<>3_carry_in_m1 was written with 
>> minus rather than plus like all of the other patterns?
> 
> It's what RTL simplification comes up with, so it is the only thing
> combine tries.  The same reason why there are separate patterns for 0
> and -1 at all.

Well, it appears to use the new pattern just as well.  A few trivial tests are
in fact simplified as one would expect during combine.


r~


Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread Segher Boessenkool
On Thu, Feb 25, 2016 at 09:08:32PM -0800, Richard Henderson wrote:
> It's the simplify-rtx.c portion of the patch that fixes the i686 regression.
> 
> In the PR, Alan raises some good points, but I don't believe that we can 
> address those for gcc6.  A new rtl reassoc optimization that takes loop 
> invariance into account will have to wait.
> 
> But we do need to take care of the rs6000 ice that results, and that's the 
> bulk of the patch -- allowing CA to be sorted to any register of the plus 
> chain.

The rs6000 change is much too risky for stage 4 as well in my opinion.
It also seems it will regress codegen (for which we have no testcases
yet, sigh).

What is the rs6000 ICE?

> Some notes:
> 
> ca_operand doesn't work as written, since CA_REGNO is not an available 
> register, and thus doesn't satisfy register_operand.

Yes, curious.  But when it was written it *did* match.  Huh.

> Is there any particular reason that subf<>3_carry_in_m1 was written with 
> minus rather than plus like all of the other patterns?

It's what RTL simplification comes up with, so it is the only thing
combine tries.  The same reason why there are separate patterns for 0
and -1 at all.


Segher


Re: [PATCH, rs6000] Fixing PR 67145

2016-02-26 Thread David Edelsohn
On Fri, Feb 26, 2016 at 12:08 AM, Richard Henderson  wrote:
> It's the simplify-rtx.c portion of the patch that fixes the i686 regression.
>
> In the PR, Alan raises some good points, but I don't believe that we can
> address those for gcc6.  A new rtl reassoc optimization that takes loop
> invariance into account will have to wait.
>
> But we do need to take care of the rs6000 ice that results, and that's the
> bulk of the patch -- allowing CA to be sorted to any register of the plus
> chain.
>
> Some notes:
>
> ca_operand doesn't work as written, since CA_REGNO is not an available
> register, and thus doesn't satisfy register_operand.
>
> Is there any particular reason that subf<>3_carry_in_m1 was written with
> minus rather than plus like all of the other patterns?

Segher wrote the rs6000 carry infrastructure, so he's the best one to comment.

- David


[PATCH, rs6000] Fixing PR 67145

2016-02-25 Thread Richard Henderson

It's the simplify-rtx.c portion of the patch that fixes the i686 regression.

In the PR, Alan raises some good points, but I don't believe that we can 
address those for gcc6.  A new rtl reassoc optimization that takes loop 
invariance into account will have to wait.


But we do need to take care of the rs6000 ice that results, and that's the bulk 
of the patch -- allowing CA to be sorted to any register of the plus chain.


Some notes:

ca_operand doesn't work as written, since CA_REGNO is not an available 
register, and thus doesn't satisfy register_operand.


Is there any particular reason that subf<>3_carry_in_m1 was written with minus 
rather than plus like all of the other patterns?



Tested on ppc64le-linux.


r~
* simplify-rtx.c (simplify_plus_minus): If only PLUS of REG,
rematerialize the canonical chain.

* config/rs6000/predicates.md (ca_operand): Use match_code.
(gpc_ca_reg_operand): New.
* config/rs6000/rs6000.md (add3_carry_in): Expand by hand.
(subf3_carry_in): Likewise.
(*add3_carry_in_reg): Rename from *add3_carry_in_internal;
accept CA at any register position, exactly once.
(*add3_carry_in_0): Similarly.
(*add3_carry_in_m1): Similarly.
(*subf3_carry_in_reg): Similarly.
(*subf3_carry_in_0): Rename with leading *.
(*subf3_carry_in_m1): Rename with leading *; use the same
plus/plus/not form as subf3_carry_in_reg.

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 072291e..98937bd 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -116,7 +116,7 @@
 
 ;; Return 1 if op is the carry register.
 (define_predicate "ca_operand"
-  (match_operand 0 "register_operand")
+  (match_code "reg,subreg")
 {
   if (GET_CODE (op) == SUBREG)
 op = SUBREG_REG (op);
@@ -230,6 +230,10 @@
   return INT_REGNO_P (REGNO (op)) || FP_REGNO_P (REGNO (op));
 })
 
+(define_predicate "gpc_ca_reg_operand"
+  (ior (match_operand 0 "gpc_reg_operand")
+   (match_operand 0 "ca_operand")))
+
 ;; Return 1 if op is a general purpose register.  Unlike gpc_reg_operand, don't
 ;; allow floating point or vector registers.
 (define_predicate "int_reg_operand"
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 0299a00..b300727 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -1770,50 +1770,65 @@
 (define_expand "add3_carry_in"
   [(parallel [
  (set (match_operand:GPR 0 "gpc_reg_operand")
- (plus:GPR (plus:GPR (match_operand:GPR 1 "gpc_reg_operand")
- (match_operand:GPR 2 "adde_operand"))
-   (reg:GPR CA_REGNO)))
+ (plus:GPR
+   (plus:GPR
+ (reg:GPR CA_REGNO)
+  (match_operand:GPR 1 "gpc_reg_operand"))
+   (match_operand:GPR 2 "adde_operand")))
  (clobber (reg:GPR CA_REGNO))])]
   ""
 {
-  if (operands[2] == const0_rtx)
-{
-  emit_insn (gen_add3_carry_in_0 (operands[0], operands[1]));
-  DONE;
-}
-  if (operands[2] == constm1_rtx)
-{
-  emit_insn (gen_add3_carry_in_m1 (operands[0], operands[1]));
-  DONE;
-}
+  rtx ca = gen_rtx_REG (mode, CA_REGNO);
+  rtx x, y;
+
+  x = gen_rtx_PLUS (mode, ca, operands[1]);
+  if (operands[2] != const0_rtx)
+x = gen_rtx_PLUS (mode, x, operands[2]);
+  x = gen_rtx_SET (operands[0], x);
+  y = gen_rtx_CLOBBER (VOIDmode, ca);
+
+  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, x, y)));
+  DONE;
 })
 
-(define_insn "*add3_carry_in_internal"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
-   (plus:GPR (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
-   (match_operand:GPR 2 "gpc_reg_operand" "r"))
- (reg:GPR CA_REGNO)))
+(define_insn "*add3_carry_in_reg"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r")
+   (plus:GPR
+ (plus:GPR
+   (match_operand:GPR 1 "gpc_ca_reg_operand" "z,r,r")
+   (match_operand:GPR 2 "gpc_ca_reg_operand" "r,z,r"))
+ (match_operand:GPR 3 "gpc_ca_reg_operand"   "r,r,z")))
(clobber (reg:GPR CA_REGNO))]
-  ""
-  "adde %0,%1,%2"
+  "ca_operand (operands[1], mode)
+   + ca_operand (operands[2], mode)
+   + ca_operand (operands[3], mode) == 1"
+  "@
+   adde %0,%2,%3
+   adde %0,%1,%3
+   adde %0,%1,%2"
   [(set_attr "type" "add")])
 
-(define_insn "add3_carry_in_0"
+(define_insn "*add3_carry_in_0"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
-   (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
- (reg:GPR CA_REGNO)))
+   (plus:GPR
+ (match_operand:GPR 1 "gpc_ca_reg_operand" "%r")
+ (match_operand:GPR 2 "gpc_ca_reg_operand" "z")))
(clobber (reg:GPR CA_REGNO))]
-  ""
+  "ca_operand (operands[1], mode)
+   + ca_operand (operands[2], mode) == 1"
   "addze %0,%1"
   [(set_attr "type" "add")])
 
-(define_insn "add3_carry_in_m1"
+(define_insn