Re: [PATCH] combine: If a parallel I2 was split, do not allow a new I2 (PR64268)

2014-12-16 Thread Segher Boessenkool
On Tue, Dec 16, 2014 at 07:28:22AM -0600, Segher Boessenkool wrote:
> I did a run for powerpc64, one for powerpc, and one for x86-64.
> 
> The powerpc64 bootstrap was with pre-installed GMP etc.; the others
> had those libraries in-tree.
> 
> "type1" is when try_combine used the ancient combine code to split a
> parallel set and set of cc; "type2" is when it used my code to split
> any other parallel that sets two things; and "type0" is when it didn't
> do either but still ended up with I1 and I2 the same UID (I think it
> might be called with the same insn twice; not a good thing, it does
> not know how to handle this; and it is really worrisome that it then
> sometimes succeeds in combining it).
> 
> "tries" is how often that split-orig-I2-to-two code is used; "recog"
> is how often it reached the first call to recog (so it passed
> can_combine_p etc.); "fail" is how often it eventually failed (after
> reaching recog), "one" is how often it combined to one insn, "two"
> is how often it combined to two.
> 
> 
> powerpc64
>   tries   recog   failone two
> type1 39214   39214   38944   202 18
> type2 21540   18968   18928   2   38
> type0 292 289 0   3
> 
> 
> powerpc
>   tries   recog   failone two
> type1 21654   21654   21167   485 2
> type2 21839   19754   19243   0   509 (*)
> type0 427 294 0   133
> 
> 
> x86-64
>   tries   recog   failone two
> type1 17387   17387   17288   70  29
> type2 40413   31681   30242   60  1369
> type0 0   0   0   0

The new numbers:

powerpc64
tries   recog   failone two
type1   39216   39216   38996   202 18
type2   21540   18968   18928   2   38


powerpc
tries   recog   failone two
type1   45276   45276   44196   104040
type2   46149   41589   40553   0   1032


x86-64
tries   recog   failone two
type1   17387   17387   17288   70  29
type2   40088   31625   30196   60  1369


Segher


Re: [PATCH] combine: If a parallel I2 was split, do not allow a new I2 (PR64268)

2014-12-16 Thread Segher Boessenkool
On Tue, Dec 16, 2014 at 07:28:22AM -0600, Segher Boessenkool wrote:
> and "type0" is when it didn't
> do either but still ended up with I1 and I2 the same UID (I think it
> might be called with the same insn twice; not a good thing, it does
> not know how to handle this; and it is really worrisome that it then
> sometimes succeeds in combining it).

... and when I make try_combine immediately refuse duplicates in I0,I1,I2
everything works again.

When INSN_UID (i1) == INSN_UID (i2), calling df_insn_delete (i1) does
set the "should be deleted" flag on i2 instead, but try_combine later
calls df_insn_rescan (i2) which resets it again.  This is icky and
looks very wrong in the debug dump, but it does work fine.

Doing an (effictively) 2->2 combine like this still feels quite unsafe
wrt combine possibly looping, but it has worked for over twenty years.

Bootstrapping, making new test results, will send a new patch later
today.

Thanks for pushing me to look much closer,


Segher


Re: [PATCH] combine: If a parallel I2 was split, do not allow a new I2 (PR64268)

2014-12-16 Thread Segher Boessenkool
On Mon, Dec 15, 2014 at 10:24:47AM -0600, Segher Boessenkool wrote:
> On Mon, Dec 15, 2014 at 04:51:14PM +0100, Paolo Bonzini wrote:
> > 1) did you check that it never triggers on e.g. an x86 bootstrap, and
> > that it doesn't trigger too often on PPC64?
> 
> I have checked on my largish connection of tests for the carry insns
> on PowerPC, and only two (related) transforms are disabled, and they
> aren't too important anyway.  Well, and the bad transforms are disabled,
> only just two of-em but much more frequent (long long x; x--;).
> 
> I haven't checked on x86, but it's a bugfix: don't do things that blow up!
> It is amazing to me that it didn't show up before.  One theory is that
> instructions that set the condition code as well as a GPR will never
> combine with a later insn to two insns, always to just one.  But nothing
> made this explicit so AFAICS it is just an accident that it worked before.
> 
> I'll do an instrumented x86 bootstrap.

I did a run for powerpc64, one for powerpc, and one for x86-64.

The powerpc64 bootstrap was with pre-installed GMP etc.; the others
had those libraries in-tree.

"type1" is when try_combine used the ancient combine code to split a
parallel set and set of cc; "type2" is when it used my code to split
any other parallel that sets two things; and "type0" is when it didn't
do either but still ended up with I1 and I2 the same UID (I think it
might be called with the same insn twice; not a good thing, it does
not know how to handle this; and it is really worrisome that it then
sometimes succeeds in combining it).

"tries" is how often that split-orig-I2-to-two code is used; "recog"
is how often it reached the first call to recog (so it passed
can_combine_p etc.); "fail" is how often it eventually failed (after
reaching recog), "one" is how often it combined to one insn, "two"
is how often it combined to two.


powerpc64
tries   recog   failone two
type1   39214   39214   38944   202 18
type2   21540   18968   18928   2   38
type0   292 289 0   3


powerpc
tries   recog   failone two
type1   21654   21654   21167   485 2
type2   21839   19754   19243   0   509 (*)
type0   427 294 0   133


x86-64
tries   recog   failone two
type1   17387   17387   17288   70  29
type2   40413   31681   30242   60  1369
type0   0   0   0   0


Not sure what to make of the high number in the x86-64 type2/two
result.


Segher


(*) The (32-bit) powerpc bootstrap failed (that's what the patch
is trying to rectify, after all); the columns on this line don't
add up correctly (two are missing; this was a -j60 build).


Re: [PATCH] combine: If a parallel I2 was split, do not allow a new I2 (PR64268)

2014-12-16 Thread Paolo Bonzini


On 15/12/2014 17:24, Segher Boessenkool wrote:
> On Mon, Dec 15, 2014 at 04:51:14PM +0100, Paolo Bonzini wrote:
>> Random questions:
>>
>> 1) did you check that it never triggers on e.g. an x86 bootstrap, and
>> that it doesn't trigger too often on PPC64?
> 
> I have checked on my largish connection of tests for the carry insns
> on PowerPC, and only two (related) transforms are disabled, and they
> aren't too important anyway.  Well, and the bad transforms are disabled,
> only just two of-em but much more frequent (long long x; x--;).
> 
> I haven't checked on x86, but it's a bugfix: don't do things that blow up!

I agree---I just want to be sure of the extent of the change.

> [ That testcase, -m32:
> 
> long long addSH(long long a, unsigned long b)
> {
>   return a + ((unsigned long long)b << 32);
> }
> 
> results in  addic 4,4,0 ; addze 3,5   while it could just be  add 3,5  ]

Ah, that's a pity.  But breaking x-- is much worse.

>> 2) if it triggers rarely, should combine just try and give a new UID to
>> i1?
> 
> That should in principle works, sure.  Seems too dangerous for stage3
> though.

Indeed.  Just thinking about it for 5.1.

Paolo


Re: [PATCH] combine: If a parallel I2 was split, do not allow a new I2 (PR64268)

2014-12-15 Thread Segher Boessenkool
On Mon, Dec 15, 2014 at 04:51:14PM +0100, Paolo Bonzini wrote:
> Random questions:
> 
> 1) did you check that it never triggers on e.g. an x86 bootstrap, and
> that it doesn't trigger too often on PPC64?

I have checked on my largish connection of tests for the carry insns
on PowerPC, and only two (related) transforms are disabled, and they
aren't too important anyway.  Well, and the bad transforms are disabled,
only just two of-em but much more frequent (long long x; x--;).

I haven't checked on x86, but it's a bugfix: don't do things that blow up!
It is amazing to me that it didn't show up before.  One theory is that
instructions that set the condition code as well as a GPR will never
combine with a later insn to two insns, always to just one.  But nothing
made this explicit so AFAICS it is just an accident that it worked before.

I'll do an instrumented x86 bootstrap.

[ That testcase, -m32:

long long addSH(long long a, unsigned long b)
{
return a + ((unsigned long long)b << 32);
}

results in  addic 4,4,0 ; addze 3,5   while it could just be  add 3,5  ]

> 2) if it triggers rarely, should combine just try and give a new UID to
> i1?

That should in principle works, sure.  Seems too dangerous for stage3
though.  combine cannot create UIDs as things stand now.

> What makes that hard?  Or should it just skip the SET_INSN_DELETED
> on i1 unless it is added to the instruction stream?

I have tried that first, but it blew up in other ways.  I didn't look
too closely, sorry.

It really is quite dangerous to allow combining just two original insns to
two other insns; there are arguments why it cannot loop in this case (it
always moves a parallel down, for example) but that is quite fragile, and
also not documented anywhere.  It also _did_ loop with some of my (perhaps
broken) patch attempts.  So I opted for the heavy hammer approach.

> That said, if (1) is true, this looks like this fix is good enough for 5
> and open branches.
> 
> Thanks David for pointing this out to me.

I need to remember to CC: people, sorry :-/

Thanks for looking,


Segher


Re: [PATCH] combine: If a parallel I2 was split, do not allow a new I2 (PR64268)

2014-12-15 Thread Paolo Bonzini


On 15/12/2014 12:05, Segher Boessenkool wrote:
> If combine is given a parallel I2, it splits it into separate I1 and
> I2 insns in some cases (one case since the beginning of time; more
> cases since my r218248).  It gives the new I1 the same UID as the I2
> already has; there is a comment that this works fine because the I1
> never is added to the insn stream.
> 
> When combine eventually replaces the insns with something new, it
> calls SET_INSN_DELETED on those it wants gone.  Since r125624 (when
> DF was added, back in 2007) SET_INSN_DELETED uses the UID of the insn
> it is called for to do the deletion for dataflow.  So since then, when
> such a split I1 is deleted, DF thinks I2 is deleted as well.  This
> of course explodes if I2 is in fact needed (but only if that I2 still
> exists at the end of combine, i.e. the insn was not combined to further
> down; that might explain why it wasn't noticed before).
> 
> This should be fixed properly, but it is somewhat involved.  This
> patch simply disallows all combinations coming from a split parallel
> I2 if it needs a new I2.  This fixes PR target/64268.
> 
> Bootstrapped on powerpc64-linux and powerpc-linux; the fails are gone.
> Regtested on powerpc64-linux -m32,-m32/-mpowerpc64,-m64,-m64/-mlra.
> Is this okay for mainline?
> 
> Segher
> 
> 
> 2014-12-15  Segher Boessenkool  
> 
> gcc/
>   * combine.c (try_combine): Don't try to split newpat if we started
>   with I2 a PARALLEL that we split.
> 
> ---
>  gcc/combine.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 8995c1d3..de2e49f 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -3471,6 +3471,17 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>if (insn_code_number < 0)
>  insn_code_number = recog_for_combine (&newpat, i3, &new_i3_notes);
>  
> +  /* If we got I1 and I2 from splitting the original (parallel) I2 into two,
> + I1 and I2 have the same UID, which means that DF ends up deleting I2
> + when it is asked to delete I1.  So only allow this if we want I2 
> deleted,
> + that is, if we get only one insn as combine result; don't try to split
> + off a new I2 if it doesn't match yet.  */
> +  if (i1 && insn_code_number < 0 && INSN_UID (i1) == INSN_UID (i2))
> +{
> +  undo_all ();
> +  return 0;
> +}
> +
>/* If we were combining three insns and the result is a simple SET
>   with no ASM_OPERANDS that wasn't recognized, try to split it into two
>   insns.  There are two ways to do this.  It can be split using a
> 

Random questions:

1) did you check that it never triggers on e.g. an x86 bootstrap, and
that it doesn't trigger too often on PPC64?

2) if it triggers rarely, should combine just try and give a new UID to
i1?  What makes that hard?  Or should it just skip the SET_INSN_DELETED
on i1 unless it is added to the instruction stream?

That said, if (1) is true, this looks like this fix is good enough for 5
and open branches.

Thanks David for pointing this out to me.

Paolo