Re: combiner: how to compute cost for bit insertion?

2017-07-11 Thread Segher Boessenkool
On Tue, Jul 11, 2017 at 11:14:20AM +0200, Georg-Johann Lay wrote:
> On 10.07.2017 23:40, Segher Boessenkool wrote:
> >On Mon, Jul 10, 2017 at 05:10:03PM +0200, Georg-Johann Lay wrote:
> >>Any ideas for a sane approach?
> >
> >You could change insn_rtx_cost to actually calculate the cost of the
> >insn, not just set_src_cost of a single set.  This will need checking
> >on a great many targets, not in the least because most target's cost
> >functions are, uh, not so good.  Big project, but hopefully well worth
> >the effort.
> 
> I already thought of such an approach by means of a new target hook;
> well aware that maintainers are not very fond of new hooks.

The big problem is that RTX costs are used for two different things.
They are used for getting a cost (estimate) for computing any random
expression, but they also are used for getting a cost for something
that is already known to be a valid machine instruction.

For most targets computing the latter would be much simpler and more
exact if we had a separate hook for it.  But we also still need the
former.  Perhaps we can teach GCC to not need that, or, if insn cost
is separated from expression cost computing the expression cost could
be cheaper as well.

> After all, what's important is to get best compilation results

Yes exactly!  And there is no quick fix I'm afraid.  For example,
something as obvious as costing a PARALLEL as the maximum cost of
any of its branches, regresses various targets.

> As my 3-liner for PR80929 is already pending review for > 1 month now,
> the prospects are not very promising for getting anything like that
> to be reviewed...

Okay, I'll go find it.  I cannot okay it but I can review it :-)

> >Or you change your cost function to on a QImode shift assume it is an
> >insert instruction.  Not correct of course (but neither is the currently
> >calculated cost), but perhaps it gives good results in practice?
> 
> Unlikely.  That machine has no barrel shifter, and when MUL or DIV is
> expanded using shifts, exact costs for shifts are needed.

No quick fixes :-(


Segher


Re: combiner: how to compute cost for bit insertion?

2017-07-11 Thread Bernd Schmidt
On 07/10/2017 05:10 PM, Georg-Johann Lay wrote:
> (set (zero_extract:QI (reg/i:QI 24 r24)
> (const_int 1 [0x1])
> (const_int 6 [0x6]))
> (lshiftrt:QI (reg:QI 52)
> (const_int 6 [0x6])))

> The problem is that the backend only sees
> 
> avr_rtx_costs[bset:combine(266)]=true (size) total=24, outer=set:
> (lshiftrt:QI (reg:QI 52)
> (const_int 6 [0x6]))

> How can I fix that?

I've thought for a while is that (set (zero_extract)) is a crazy way to
express this kind of operation. This should be

(set (destreg) (bfinsert (input) (bitpos) (len) (value))

Adding an rtx code like that is trivial, but ideally all the code that
understands set of zero_extract would also be updated to understand the
new form.


Bernd


Re: combiner: how to compute cost for bit insertion?

2017-07-11 Thread Georg-Johann Lay

On 10.07.2017 23:40, Segher Boessenkool wrote:

On Mon, Jul 10, 2017 at 05:10:03PM +0200, Georg-Johann Lay wrote:

Any ideas for a sane approach?


You could change insn_rtx_cost to actually calculate the cost of the
insn, not just set_src_cost of a single set.  This will need checking
on a great many targets, not in the least because most target's cost
functions are, uh, not so good.  Big project, but hopefully well worth
the effort.


I already thought of such an approach by means of a new target hook;
well aware that maintainers are not very fond of new hooks.

Something like so:

Add a new hook that, per default, returns a magic value which indicates
that it is not implemented or that it's too tedious to compute for a
backend.  In that case, SET_DEST is dropped and the usual rtx_cost hook
is run.

If a different value is returned, that value is used for the cost.

The changes would be mostly in rtlanal.c to use cost from that hook
or graceful degrade to classic rtx_cost.

That way, targets could migrate to that hook if they need more exact
cost or SET_DEST or want to dig into PARALLELs. Cf. PR80929.

Just passing a pattern or SET with outer_code = INSN might raise so many
performance degradations across all targets when they would return
some "unknown cost" or make unknown stuff expensive.  Hence passing
outer_code = INSN is not a good idea, except we find someone who is
proficient in all backends' costs and willing to go through that...

After all, what's important is to get best compilation results (at least
if we don'd want to hear even more of the "just switch to xxx compiler:
better code, better diagnostics, better support").

As my 3-liner for PR80929 is already pending review for > 1 month now,
the prospects are not very promising for getting anything like that
to be reviewed...



Or you change your cost function to on a QImode shift assume it is an
insert instruction.  Not correct of course (but neither is the currently
calculated cost), but perhaps it gives good results in practice?


Unlikely.  That machine has no barrel shifter, and when MUL or DIV is
expanded using shifts, exact costs for shifts are needed.

Johann



Segher



Re: combiner: how to compute cost for bit insertion?

2017-07-10 Thread Segher Boessenkool
On Mon, Jul 10, 2017 at 05:10:03PM +0200, Georg-Johann Lay wrote:
> Any ideas for a sane approach?

You could change insn_rtx_cost to actually calculate the cost of the
insn, not just set_src_cost of a single set.  This will need checking
on a great many targets, not in the least because most target's cost
functions are, uh, not so good.  Big project, but hopefully well worth
the effort.

Or you change your cost function to on a QImode shift assume it is an
insert instruction.  Not correct of course (but neither is the currently
calculated cost), but perhaps it gives good results in practice?


Segher


Re: Combiner fix for PR79910

2017-03-20 Thread Segher Boessenkool
On Sat, Mar 18, 2017 at 11:23:56AM -0500, Segher Boessenkool wrote:
> > >So, no, I'm not okay with this.  It is very expensive, it is doing open
> > >heart surgery on combine's internal structures (in a way that may or may
> > >not work), and all that to combine some insns in a case that should not
> > >exist anyway.
> > Please don't be dramatic.  If you've got a better suggestion for how to 
> > fix this, be my guest.  I don't like the compile-time cost either, but I 
> > don't see a better solution.
> 
> I'll commit the following monday or so:

Done now, with the previous patches to combine.c reverted.


Segher


Re: Combiner fix for PR79910

2017-03-18 Thread Segher Boessenkool
On Sat, Mar 18, 2017 at 11:23:56AM -0500, Segher Boessenkool wrote:
> Bootstrapped and regression checked on x86_64-linux and powerpc64-linux
> {-m32,-m64}.

Now also tested on aarch64-linux; no new failures.

Segher


> 2017-03-18  Segher Boessenkool  
> 
>   PR rtl-optimization/79910
>   * combine.c (can_combine_p): Do not allow combining an I0 or I1
>   if its dest is used by an insn before I2 (other than the combined
>   insns themselves, which are properly handled already).


Re: Combiner fix for PR79910

2017-03-18 Thread Segher Boessenkool
On Fri, Mar 17, 2017 at 04:23:57PM -0600, Jeff Law wrote:
> >>+  /* For combinations that may result in two insns, we have to gather
> >>+ some extra information about registers used, so that we can
> >>+ update all relevant LOG_LINKS later.  */
> >
> >Please just refuse to do the combination in such cases instead.
> ?!? That would essentially mean we can't do 3->2 combinations.

No, only those that would lead to trouble.

> >So, no, I'm not okay with this.  It is very expensive, it is doing open
> >heart surgery on combine's internal structures (in a way that may or may
> >not work), and all that to combine some insns in a case that should not
> >exist anyway.
> Please don't be dramatic.  If you've got a better suggestion for how to 
> fix this, be my guest.  I don't like the compile-time cost either, but I 
> don't see a better solution.

I'll commit the following monday or so:


Subject: [PATCH] combine: Fix 79910

If the dest of an I0 or I1 is used in an insn before I2, as can happen
in various uncommon cases, and we manage to do the combination, the set
is moved to I2, which is wrong.  Don't allow combining the insns in this
case.

This fixes the PR79910 testcase.  I also tested building Linux kernels
for all supported architectures: this fixes a bug for blackfin and
makes no changes on other architectures.

Bootstrapped and regression checked on x86_64-linux and powerpc64-linux
{-m32,-m64}.


Segher


2017-03-18  Segher Boessenkool  

PR rtl-optimization/79910
* combine.c (can_combine_p): Do not allow combining an I0 or I1
if its dest is used by an insn before I2 (other than the combined
insns themselves, which are properly handled already).

---
 gcc/combine.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index 3e5c439..24ecedf 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -1953,6 +1953,10 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn 
*pred ATTRIBUTE_UNUSED,
   || (succ2 && FIND_REG_INC_NOTE (succ2, dest))
   /* Don't substitute into a non-local goto, this confuses CFG.  */
   || (JUMP_P (i3) && find_reg_note (i3, REG_NON_LOCAL_GOTO, NULL_RTX))
+  /* Make sure that DEST is not used after INSN but before SUCC, or
+between SUCC and SUCC2.  */
+  || (succ && reg_used_between_p (dest, insn, succ))
+  || (succ2 && reg_used_between_p (dest, succ, succ2))
   /* Make sure that DEST is not used after SUCC but before I3.  */
   || (!all_adjacent
  && ((succ2
-- 
1.9.3



Re: Combiner fix for PR79910

2017-03-17 Thread Jeff Law

On 03/17/2017 04:14 PM, Segher Boessenkool wrote:

Thanks for not cc:ing me on any of this.
There's really no need for getting upset.  Bernd posted the message to 
the patches list.  That's sufficient.




On Wed, Mar 15, 2017 at 04:00:21PM +0100, Bernd Schmidt wrote:

+/* Set up a set of registers used in an insn.  Called through note_uses,
+   arguments as described for that function.  */
+
+static void
+record_used_regs (rtx *xptr, void *data)
+{
+  bitmap set = (bitmap)data;


Space after cast, throughout.

A nit.  Bernd, can you please fix this.




+  int i, j;
+  enum rtx_code code;
+  const char *fmt;
+  rtx x = *xptr;
+
+  /* repeat is used to turn tail-recursion into iteration since GCC
+ can't do it when there's no return value.  */
+ repeat:


This comment is incorrect.
Depends on the situation.  GCC's ability to turn tail recursion into 
iteration is not good.





+ /* If we are about to do the last recursive call
+needed at this level, change it into iteration.
+This function is called enough to be worth it.  */


This function is called enough that it should not be called at all.
I disagree.  We have a code gen bug. We need a solution.  Feel free to 
come up with something better.





+  /* For combinations that may result in two insns, we have to gather
+ some extra information about registers used, so that we can
+ update all relevant LOG_LINKS later.  */


Please just refuse to do the combination in such cases instead.

?!? That would essentially mean we can't do 3->2 combinations.




+   /* See comments above where we calculate the bitmap.  */
+   EXECUTE_IF_SET_IN_BITMAP ((bitmap)new_regs_in_i2,
+ LAST_VIRTUAL_REGISTER, i, iter)


Why do you need a cast here at all?


+ {
+   rtx reg = regno_reg_rtx[i];
+   rtx_insn *other;
+   for (other = NEXT_INSN (i2); other != i3; other = NEXT_INSN (other))
+ if (NONDEBUG_INSN_P (other)
+ && (reg_overlap_mentioned_p (reg, PATTERN (other))
+ || (CALL_P (other) && find_reg_fusage (other, USE, reg
+   {
+ if (dump_file)
+   fprintf (dump_file,
+"found extra use of reg %d at insn %d\n", i,
+INSN_UID (other));
+ insn_link **plink;
+ for (plink = _LINKS (other);
+  *plink;
+  plink = &(*plink)->next)
+   {
+ insn_link *link = *plink;
+ if (link->regno == i)
+   {
+ *plink = link->next;
+ link->next = i3links;
+ i3links = link;
+ break;
+   }
+   }
+ break;
+   }
+ }


This should be a separate function.

Bernd, can you pull this out into its own function.



So, no, I'm not okay with this.  It is very expensive, it is doing open
heart surgery on combine's internal structures (in a way that may or may
not work), and all that to combine some insns in a case that should not
exist anyway.
Please don't be dramatic.  If you've got a better suggestion for how to 
fix this, be my guest.  I don't like the compile-time cost either, but I 
don't see a better solution.


Jeff


Re: Combiner fix for PR79910

2017-03-17 Thread Jeff Law

On 03/17/2017 04:16 PM, Segher Boessenkool wrote:

On Wed, Mar 15, 2017 at 12:09:18AM +0100, Bernd Schmidt wrote:

I suppose at the moment we don't do 2->2 combinations, so we could
conditionalize this on having an i1.


You suppose wrong.  If one of the resulting insns is set_noop_p then
2->2 is allowed, for example.  Also in the hopefully not so far future
we will allow 2->2 in general.

But in a 2->2 where one is a nop we're not going to run into problems.

jeff


Re: Combiner fix for PR79910

2017-03-17 Thread Segher Boessenkool
On Wed, Mar 15, 2017 at 12:09:18AM +0100, Bernd Schmidt wrote:
> I suppose at the moment we don't do 2->2 combinations, so we could 
> conditionalize this on having an i1.

You suppose wrong.  If one of the resulting insns is set_noop_p then
2->2 is allowed, for example.  Also in the hopefully not so far future
we will allow 2->2 in general.


Segher


Re: Combiner fix for PR79910

2017-03-17 Thread Segher Boessenkool
Thanks for not cc:ing me on any of this.

On Wed, Mar 15, 2017 at 04:00:21PM +0100, Bernd Schmidt wrote:
> +/* Set up a set of registers used in an insn.  Called through note_uses,
> +   arguments as described for that function.  */
> +
> +static void
> +record_used_regs (rtx *xptr, void *data)
> +{
> +  bitmap set = (bitmap)data;

Space after cast, throughout.

> +  int i, j;
> +  enum rtx_code code;
> +  const char *fmt;
> +  rtx x = *xptr;
> +
> +  /* repeat is used to turn tail-recursion into iteration since GCC
> + can't do it when there's no return value.  */
> + repeat:

This comment is incorrect.

> +   /* If we are about to do the last recursive call
> +  needed at this level, change it into iteration.
> +  This function is called enough to be worth it.  */

This function is called enough that it should not be called at all.

> +  /* For combinations that may result in two insns, we have to gather
> + some extra information about registers used, so that we can
> + update all relevant LOG_LINKS later.  */

Please just refuse to do the combination in such cases instead.

> + /* See comments above where we calculate the bitmap.  */
> + EXECUTE_IF_SET_IN_BITMAP ((bitmap)new_regs_in_i2,
> +   LAST_VIRTUAL_REGISTER, i, iter)

Why do you need a cast here at all?

> +   {
> + rtx reg = regno_reg_rtx[i];
> + rtx_insn *other;
> + for (other = NEXT_INSN (i2); other != i3; other = NEXT_INSN (other))
> +   if (NONDEBUG_INSN_P (other)
> +   && (reg_overlap_mentioned_p (reg, PATTERN (other))
> +   || (CALL_P (other) && find_reg_fusage (other, USE, reg
> + {
> +   if (dump_file)
> + fprintf (dump_file,
> +  "found extra use of reg %d at insn %d\n", i,
> +  INSN_UID (other));
> +   insn_link **plink;
> +   for (plink = _LINKS (other);
> +*plink;
> +plink = &(*plink)->next)
> + {
> +   insn_link *link = *plink;
> +   if (link->regno == i)
> + {
> +   *plink = link->next;
> +   link->next = i3links;
> +   i3links = link;
> +   break;
> + }
> + }
> +   break;
> + }
> +   }

This should be a separate function.

So, no, I'm not okay with this.  It is very expensive, it is doing open
heart surgery on combine's internal structures (in a way that may or may
not work), and all that to combine some insns in a case that should not
exist anyway.


Segher


Re: Combiner fix for PR79910

2017-03-17 Thread Jeff Law

On 03/15/2017 11:07 AM, Jeff Law wrote:

On 03/15/2017 09:00 AM, Bernd Schmidt wrote:

On 03/15/2017 12:09 AM, Bernd Schmidt wrote:

 I'll retest with your
suggestion and with the bitmap creation conditional on i1 being nonnull.


Like this (also had to throw in a bitmap_empty_p). Retested as before.
Ok?

Yea, not surprised you needed the bitmap_empty_p.

OK with or without the debug counter -- your decide on whether or not to
include it.
Given your note that you had planned to remove the dbg_cnt stuff, I went 
ahead and remove the dbg_cnt stuff, and installed the patch (and testcase).


Jeff



Re: Combiner fix for PR79910

2017-03-15 Thread Jeff Law

On 03/15/2017 09:00 AM, Bernd Schmidt wrote:

On 03/15/2017 12:09 AM, Bernd Schmidt wrote:

 I'll retest with your
suggestion and with the bitmap creation conditional on i1 being nonnull.


Like this (also had to throw in a bitmap_empty_p). Retested as before. Ok?

Yea, not surprised you needed the bitmap_empty_p.

OK with or without the debug counter -- your decide on whether or not to 
include it.


jeff



Re: Combiner fix for PR79910

2017-03-15 Thread Bernd Schmidt

On 03/15/2017 04:00 PM, Bernd Schmidt wrote:

On 03/15/2017 12:09 AM, Bernd Schmidt wrote:

 I'll retest with your
suggestion and with the bitmap creation conditional on i1 being nonnull.


Like this (also had to throw in a bitmap_empty_p). Retested as before. Ok?


Oops, that one also has dbg_cnt stuff in it which I was going to remove. 
If you want to approve that along with the rest, the following bit is 
also needed:


Index: gcc/dbgcnt.def
===
--- gcc/dbgcnt.def  (revision 245685)
+++ gcc/dbgcnt.def  (working copy)
@@ -145,6 +145,7 @@ DEBUG_COUNTER (asan_use_after_scope)
 DEBUG_COUNTER (auto_inc_dec)
 DEBUG_COUNTER (ccp)
 DEBUG_COUNTER (cfg_cleanup)
+DEBUG_COUNTER (combine)
 DEBUG_COUNTER (cprop)
 DEBUG_COUNTER (cse2_move2add)
 DEBUG_COUNTER (dce)


Bernd


Re: Combiner fix for PR79910

2017-03-15 Thread Bernd Schmidt

On 03/15/2017 12:09 AM, Bernd Schmidt wrote:

 I'll retest with your
suggestion and with the bitmap creation conditional on i1 being nonnull.


Like this (also had to throw in a bitmap_empty_p). Retested as before. Ok?


Bernd

Index: gcc/combine.c
===
--- gcc/combine.c	(revision 245685)
+++ gcc/combine.c	(working copy)
@@ -104,6 +104,7 @@ along with GCC; see the file COPYING3.
 #include "valtrack.h"
 #include "rtl-iter.h"
 #include "print-rtl.h"
+#include "dbgcnt.h"
 
 /* Number of attempts to combine instructions in this function.  */
 
@@ -2559,6 +2560,57 @@ can_split_parallel_of_n_reg_sets (rtx_in
   return true;
 }
 
+/* Set up a set of registers used in an insn.  Called through note_uses,
+   arguments as described for that function.  */
+
+static void
+record_used_regs (rtx *xptr, void *data)
+{
+  bitmap set = (bitmap)data;
+  int i, j;
+  enum rtx_code code;
+  const char *fmt;
+  rtx x = *xptr;
+
+  /* repeat is used to turn tail-recursion into iteration since GCC
+ can't do it when there's no return value.  */
+ repeat:
+  if (x == 0)
+return;
+
+  code = GET_CODE (x);
+  if (REG_P (x))
+{
+  unsigned regno = REGNO (x);
+  unsigned end_regno = END_REGNO (x);
+  while (regno < end_regno)
+	bitmap_set_bit (set, regno++);
+  return;
+}
+
+  /* Recursively scan the operands of this expression.  */
+
+  for (i = GET_RTX_LENGTH (code) - 1, fmt = GET_RTX_FORMAT (code); i >= 0; i--)
+{
+  if (fmt[i] == 'e')
+	{
+	  /* If we are about to do the last recursive call
+	 needed at this level, change it into iteration.
+	 This function is called enough to be worth it.  */
+	  if (i == 0)
+	{
+	  x = XEXP (x, 0);
+	  goto repeat;
+	}
+
+	  record_used_regs ( (x, i), data);
+	}
+  else if (fmt[i] == 'E')
+	for (j = 0; j < XVECLEN (x, i); j++)
+	  record_used_regs ( (x, i, j), data);
+}
+}
+
 /* Try to combine the insns I0, I1 and I2 into I3.
Here I0, I1 and I2 appear earlier than I3.
I0 and I1 can be zero; then we combine just I2 into I3, or I1 and I2 into
@@ -2742,6 +2794,27 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 
   added_links_insn = 0;
 
+  /* For combinations that may result in two insns, we have to gather
+ some extra information about registers used, so that we can
+ update all relevant LOG_LINKS later.  */
+  auto_bitmap i2_regset, i3_regset, links_regset;
+  if (i1)
+{
+  note_uses ( (i2), record_used_regs, (bitmap)i2_regset);
+  note_uses ( (i3), record_used_regs, (bitmap)i3_regset);
+  insn_link *ll;
+  FOR_EACH_LOG_LINK (ll, i3)
+	bitmap_set_bit (links_regset, ll->regno);
+  FOR_EACH_LOG_LINK (ll, i2)
+	bitmap_set_bit (links_regset, ll->regno);
+  if (i1)
+	FOR_EACH_LOG_LINK (ll, i1)
+	  bitmap_set_bit (links_regset, ll->regno);
+  if (i0)
+	FOR_EACH_LOG_LINK (ll, i0)
+	  bitmap_set_bit (links_regset, ll->regno);
+}
+
   /* First check for one important special case that the code below will
  not handle.  Namely, the case where I1 is zero, I2 is a PARALLEL
  and I3 is a SET whose SET_SRC is a SET_DEST in I2.  In that case,
@@ -4004,6 +4077,12 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 	}
 }
 
+  if (!dbg_cnt (combine))
+{
+  undo_all ();
+  return 0;
+}
+
   /* If it still isn't recognized, fail and change things back the way they
  were.  */
   if ((insn_code_number < 0
@@ -4051,6 +4130,33 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
   return 0;
 }
 
+  auto_bitmap new_regs_in_i2;
+  if (newi2pat)
+{
+  /* We need to discover situations where we introduce a use of a
+	 register into I2, where none of the existing LOG_LINKS contain
+	 a reference to it.  This can happen if previously I3 referenced
+	 the reg, and there is an additional use between I2 and I3.  We
+	 must remove the LOG_LINKS entry from that additional use and
+	 distribute it along with our own ones.  */
+	note_uses (, record_used_regs, (bitmap)new_regs_in_i2);
+	bitmap_and_compl_into (new_regs_in_i2, i2_regset);
+	bitmap_and_compl_into (new_regs_in_i2, links_regset);
+
+	/* Here, we first look for situations where a hard register use
+	   moved, and just give up.  This should happen approximately
+	   never, and it's not worth it to deal with possibilities like
+	   multi-word registers.  Later, when fixing up LOG_LINKS, we
+	   deal with the case where a pseudo use moved.  */
+	if (!bitmap_empty_p (new_regs_in_i2)
+	&& prev_nonnote_insn (i3) != i2
+	&& bitmap_first_set_bit (new_regs_in_i2) < FIRST_PSEUDO_REGISTER)
+	  {
+	undo_all ();
+	return 0;
+	  }
+}
+
   if (MAY_HAVE_DEBUG_INSNS)
 {
   struct undo *undo;
@@ -4494,6 +4600,45 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 			NULL_RTX, NULL_RTX, NULL_RTX);
   }
 
+if (newi2pat)
+  {
+	bitmap_iterator iter;
+	unsigned int i;
+
+	/* See comments above where we calculate the bitmap.  */
+	

Re: Combiner fix for PR79910

2017-03-14 Thread Bernd Schmidt

On 03/15/2017 12:03 AM, Jeff Law wrote:

On 03/10/2017 04:24 PM, Bernd Schmidt wrote:


PR rtl-optimization/79910
* combine.c (record_used_regs): New static function.
(try_combine): Handle situations where there is an additional
instruction between I2 and I3 which needs to have a LOG_LINK
updated.

PR rtl-optimization/79910
* gcc.dg/torture/pr79910.c: New test.

What a nasty little problem.

I don't like that we have to build these bitmaps due to the compile-time
cost.  Would it make sense to only build them when i0/i1 exist?


I suppose at the moment we don't do 2->2 combinations, so we could 
conditionalize this on having an i1.



We don't do 4->3 combinations, just 4->2 and 3->2, so it's only the
i2pattern where we might need to conjure up a LOG_LINK, right?


We don't do 4->3 combinations, right?  So we only have to care about
when there's going to be an newi2pat, right (3->2 or 4->2).



We don't ever create a newi1pat (for a 4->3 combination), right?  So we
only have to worry about testing when there's a newi2pat, right?


Yes to all, there isn't a newi1pat, only 4->2 and 3->2 can be an issue.


+if (prev_nonnote_insn (i3) != i2)
+  for (unsigned r = 0; r < FIRST_PSEUDO_REGISTER; r++)
+if (bitmap_bit_p (new_regs_in_i2, r))

if (prev_nonnote_insn (i3) != i2
&& bitmap_first_set_bit (new_regs_in_i2) < FIRST_PSEUDO_REGISTER)


Ah. I had wondered about the loop but only thought in the direction of 
intersecting this bitmap with one of all hard regs (and I think there 
isn't such a bitmap, so I kept the loop). I'll retest with your 
suggestion and with the bitmap creation conditional on i1 being nonnull.



Bernd


Re: Combiner fix for PR79910

2017-03-14 Thread Jeff Law

On 03/10/2017 04:24 PM, Bernd Schmidt wrote:

In this PR, we have a few insns involved in two instruction combinations:

insn 16: set r100
insn 27: some calculation
insn 28: some calculation
insn 32: using r100
insn 33: using r100
insn 35: some calculation

Then we combine insns 27, 28 and 33, producing two output insns, As a
result, insn 28 (i2) now obtains a use of r100. But insn 32, which is
not involved in this combination, has the LOG_LINKS entry for that
register, and we don't know that we need to update it. As a result, the
second combination, involving regs 16, 32 and 35 (based on the remaining
LOG_LINK for r100), produces incorrect code, as we don't realize there's
a use of r100 before insn 32.

The following fixes it. Bootstrapped and tested on x86_64-linux, ok (on
all branches)?


Bernd


79910.diff


PR rtl-optimization/79910
* combine.c (record_used_regs): New static function.
(try_combine): Handle situations where there is an additional
instruction between I2 and I3 which needs to have a LOG_LINK
updated.

PR rtl-optimization/79910
* gcc.dg/torture/pr79910.c: New test.

What a nasty little problem.

I don't like that we have to build these bitmaps due to the compile-time 
cost.  Would it make sense to only build them when i0/i1 exist?


We don't do 4->3 combinations, just 4->2 and 3->2, so it's only the 
i2pattern where we might need to conjure up a LOG_LINK, right?



We don't do 4->3 combinations, right?  So we only have to care about 
when there's going to be an newi2pat, right (3->2 or 4->2).
We don't ever create a newi1pat (for a 4->3 combination), right?  So we 
only have to worry about testing when there's a newi2pat, right?



Do you need to handle 4->3, in which case the new bits could be on newi1pat?


Index: gcc/combine.c
===
--- gcc/combine.c   (revision 245685)
+++ gcc/combine.c   (working copy)
@@ -2559,6 +2560,57 @@ can_split_parallel_of_n_reg_sets (rtx_in
   return true;
 }




@@ -4051,6 +4124,33 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
   return 0;
 }

+  auto_bitmap new_regs_in_i2;
+  if (newi2pat)
+{
+  /* We need to discover situations where we introduce a use of a
+register into I2, where none of the existing LOG_LINKS contain
+a reference to it.  This can happen if previously I3 referenced
+the reg, and there is an additional use between I2 and I3.  We
+must remove the LOG_LINKS entry from that additional use and
+distribute it along with our own ones.  */
+   note_uses (, record_used_regs, (bitmap)new_regs_in_i2);
+   bitmap_and_compl_into (new_regs_in_i2, i2_regset);
+   bitmap_and_compl_into (new_regs_in_i2, links_regset);
+
+   /* Here, we first look for situations where a hard register use
+  moved, and just give up.  This should happen approximately
+  never, and it's not worth it to deal with possibilities like
+  multi-word registers.  Later, when fixing up LOG_LINKS, we
+  deal with the case where a pseudo use moved.  */
+   if (prev_nonnote_insn (i3) != i2)
+ for (unsigned r = 0; r < FIRST_PSEUDO_REGISTER; r++)
+   if (bitmap_bit_p (new_regs_in_i2, r))

if (prev_nonnote_insn (i3) != i2
&& bitmap_first_set_bit (new_regs_in_i2) < FIRST_PSEUDO_REGISTER)

?



Overall it seems reasonable.  If possible, let's avoid the calls to 
create the bitmaps in case where we know we'll never be creating a new 
use in i2.


I'm not wed to using bitmap_first_set_bit.  It's clearer and may be 
faster since you're not doing a ton of calls into bitmap_bit_p.  The 
bitmap_first_set_bit implementation under the hood finds the word within 
the first bitmap element that is nonzero, then uses ctzl on that to get 
its bit.


Jeff


Re: combiner

2010-10-26 Thread Georg Lay
roy rosen schrieb:
 Thanks Georg,
 The -fdump-rtl-combine-details really helps.
 Regarding implementing it through cbranchhi4, this is not enough for
 me because when getting to this pattern the operands have already been
 expanded, and I am trying to prevent that.
 Is there a way around it?

 Roy.

 2010/10/25 Georg Lay a...@gjlay.de:
 roy rosen schrieb:
 In my port I get to such a situation:

 (insn 60 59 61 4 a.c:65 (set (subreg:SI (reg:HI 129 [ __prephitmp_4 ]) 0)
 (zero_extract:SI (subreg:SI (reg/v:DI 138 [ v4hi1 ]) 4)
 (const_int 16 [0x10])
 (const_int 16 [0x10]))) 53 {extzv} (nil))

 (insn 61 60 62 4 a.c:65 (set (reg:BI 159)
 (gtu:BI (reg:HI 129 [ __prephitmp_4 ])
 (reg/v:HI 143 [ usiThresh ]))) 94 {cmprr_hi_gtu} (nil))

 The patterns for these are:

 (define_insn extzv
   [(set (match_operand:SI 0 register_operand =d)
 (zero_extract:SI (match_operand:SI 1 register_operand d)
  (match_operand:SI 2 immediate_operand U06)
  (match_operand:SI 3 immediate_operand U06)))]
   
   extractua\t %2, %3, %1, %0 %!
 )

 and

 (define_insn cmprr_hi_code
   [(set (match_operand:BI 0 register_operand =c)
 (any_cond_rr:BI (match_operand:HI 1 register_operand d)
  (match_operand:HI 2 register_operand d)))]
   
   cmpcode %2.L, %1.L, %0:%I0 %!
 )

 I want the combiner to combine both insns since I have an intruction
 which can compare from an HI partial register.
 I am trying to write an insn pattern for that but the combiner does not use 
 i.
 I thought about something like:

 (define_insn cmprr_hi_code_1
   [(set (match_operand:BI 0 register_operand =c)
 (any_cond_rr:BI (zero_extract:SI (match_operand:DI 1
 register_operand d)
  (const_int 16) (const_int 16))
  (match_operand:HI 2 register_operand d)))]
   
   cmpcode %2.L, %1.L, %0:%I0 %!
 )

 but it does not work.
 Can someone please help?
 To compare HI there are standard insns cstorehi4 and cbranchhi4. Maybe 
 you
 wand to implement an insn/expander that supports them.

 As far as the combiner is concerned: Look at combiner dumps with
 -fdump-rtl-combine-details if the combiner tries to combine these insns and 
 the
 way it does this. It's useless to write patterns that combine will never 
 try...

 Moreover, the first pattern looks rather like an insv than an extract. So 
 it
 might help to write proper insv-expanders.

Hard to tell without seeing what combine tries and how insv expands.

In some situations I get better results by expanding insv algebraically, i.e. as
a combination of AND and IOR stuff. This is a little bit tedious but allows the
output operand to be different from the insertion dest.

Georg Lay


Re: combiner

2010-10-25 Thread Ian Lance Taylor
roy rosen roy.1ro...@gmail.com writes:

 In my port I get to such a situation:

 (insn 60 59 61 4 a.c:65 (set (subreg:SI (reg:HI 129 [ __prephitmp_4 ]) 0)
 (zero_extract:SI (subreg:SI (reg/v:DI 138 [ v4hi1 ]) 4)
 (const_int 16 [0x10])
 (const_int 16 [0x10]))) 53 {extzv} (nil))

 (insn 61 60 62 4 a.c:65 (set (reg:BI 159)
 (gtu:BI (reg:HI 129 [ __prephitmp_4 ])
 (reg/v:HI 143 [ usiThresh ]))) 94 {cmprr_hi_gtu} (nil))

 The patterns for these are:

 (define_insn extzv
   [(set (match_operand:SI 0 register_operand =d)
 (zero_extract:SI (match_operand:SI 1 register_operand d)
  (match_operand:SI 2 immediate_operand U06)
  (match_operand:SI 3 immediate_operand U06)))]
   
   extractua\t %2, %3, %1, %0 %!
 )

 and

 (define_insn cmprr_hi_code
   [(set (match_operand:BI 0 register_operand =c)
 (any_cond_rr:BI (match_operand:HI 1 register_operand d)
  (match_operand:HI 2 register_operand d)))]
   
   cmpcode %2.L, %1.L, %0:%I0 %!
 )

 I want the combiner to combine both insns since I have an intruction
 which can compare from an HI partial register.
 I am trying to write an insn pattern for that but the combiner does not use i.
 I thought about something like:

 (define_insn cmprr_hi_code_1
   [(set (match_operand:BI 0 register_operand =c)
 (any_cond_rr:BI (zero_extract:SI (match_operand:DI 1
 register_operand d)
  (const_int 16) (const_int 16))
  (match_operand:HI 2 register_operand d)))]
   
   cmpcode %2.L, %1.L, %0:%I0 %!
 )

 but it does not work.


That insn won't work because you don't have a DImode operand to your
zero_extract.  You have an SImode operand.  It's a subeg of a DImode
operand, but that doesn't matter.

I'm not sure how well combine will work with the paradoxical subreg
(subreg:SI (reg:HI 129 [ __prephitmp_4 ]) 0) though.

Ian