[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-20 Thread doko at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

--- Comment #19 from Matthias Klose  ---
here is a test case from an ICE I saw when backporting the patch to the gcc-5
Linaro branch. Fails with -O2, works with -O1

typedef int Nlm_Int4, ValNodePtr;
Nlm_Int4 b, e;
char c, d;
void fn1();
typedef struct { double patternProbability; } patternSearchItems;
char fn2();
void fn3();
patternSearchItems a;
void fn4() {
  Nlm_Int4 f[2];
  char g[1];
  a = *(patternSearchItems *)fn3;
  while (fn2(c, d & e, b))
if (a.patternProbability) {
  fn1(g);
  fn1(f);
}
}

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-18 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

--- Comment #17 from Richard Henderson  ---
Author: rth
Date: Mon Jan 18 20:56:13 2016
New Revision: 232540

URL: https://gcc.gnu.org/viewcvs?rev=232540=gcc=rev
Log:
PR target/69176

  * config/aarch64/aarch64.md (add3): Move long immediate
  operands to pseudo only if CSE is expected.  Split long immediate
  operands only after reload, and for the stack pointer.
  (*add3_pluslong): Remove.
  (*addsi3_aarch64, *adddi3_aarch64): Merge into...
  (*add3_aarch64): ... here.  Add r/rk/Upl alternative.
  (*addsi3_aarch64_uxtw): Add r/rk/Upl alternative.
  (*add3 peepholes): New.
  (*add3 splitters): New.
  * config/aarch64/constraints.md (Upl): New.
  * config/aarch64/predicates.md (aarch64_pluslong_strict_immedate): New.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64.md
trunk/gcc/config/aarch64/constraints.md
trunk/gcc/config/aarch64/predicates.md

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-18 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

Richard Henderson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #18 from Richard Henderson  ---
Fixed.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-08 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

Richard Henderson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||rth at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |rth at gcc dot gnu.org

--- Comment #10 from Richard Henderson  ---
Created attachment 37267
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37267=edit
proposed patch

Andrew is exactly right re plus being special.

The pluslong hoops that are being jumped through are really the
domain for post-reload splitters and peepholes.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-08 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

Richard Henderson  changed:

   What|Removed |Added

  Attachment #37267|0   |1
is obsolete||

--- Comment #13 from Richard Henderson  ---
Created attachment 37281
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37281=edit
second patch

Ok, so the patch didn't survive overnight testing.

The primary issue being that stack pointer adjustments need to
be split immediately, since try_split will refuse to operate on
an instruction that modifies the stack pointer.

Thanks for the test cases in #c11, I'd forgotten about exposing
this to cse.  This adjusted patch handles them as expected.

The lack of condition on the peepholes was just a think-o.
Testing for single-insn move was always intended, just forgotten.

Full testing for this is still a couple hours off completion.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-08 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

--- Comment #14 from Richard Henderson  ---
(In reply to Wilco from comment #12)
> The only remaining question I had whether it would be possible to use
> peephole expansions rather than the late splits. If they are evaluated in
> order then if the first doesn't match or if there is no temporary, the 2nd
> one will split into 2 adds.

Why would you want to do this as peepholes?

I don't know what you're thinking of, but I'm pretty sure
it's already set up like you want: peep2 pass changes the
insns to use temporaries when possible, then splitting to
2 adds happens for any remaining long constants.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-08 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

--- Comment #16 from Richard Henderson  ---
(In reply to Wilco from comment #15)
> The final split happens a few phases later, so I wondered whether it would
> be feasible to do all the splitting during peep2. There is likely no real CQ
> gain in doing so, it just might make understanding the md file easier.

No, it's not, since peep2 doesn't always run.  Whereas the splits
must be available during final for -O0.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-08 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

--- Comment #15 from Wilco  ---
(In reply to Richard Henderson from comment #14)
> (In reply to Wilco from comment #12)
> > The only remaining question I had whether it would be possible to use
> > peephole expansions rather than the late splits. If they are evaluated in
> > order then if the first doesn't match or if there is no temporary, the 2nd
> > one will split into 2 adds.
> 
> Why would you want to do this as peepholes?
> 
> I don't know what you're thinking of, but I'm pretty sure
> it's already set up like you want: peep2 pass changes the
> insns to use temporaries when possible, then splitting to
> 2 adds happens for any remaining long constants.

The final split happens a few phases later, so I wondered whether it would be
feasible to do all the splitting during peep2. There is likely no real CQ gain
in doing so, it just might make understanding the md file easier.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-08 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

--- Comment #12 from Wilco  ---
(In reply to Wilco from comment #11)

> With your patch expand always emits add instructions with complex immediates
> which then can't be optimized.

OK, so I can change your patch do the right thing with 2 minor changes:

* expand add3 should use aarch64_plus_immediate
* the peepholes should have aarch64_move_imm (INTVAL (operands[2]), mode)
as the condition (we want to expand into add+add rather than mov+movk+add...).

With this the codesize of the example reduces by an extra 8% :-)

The only remaining question I had whether it would be possible to use peephole
expansions rather than the late splits. If they are evaluated in order then if
the first doesn't match or if there is no temporary, the 2nd one will split
into 2 adds.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-08 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

--- Comment #11 from Wilco  ---
(In reply to Richard Henderson from comment #10)
> Created attachment 37267 [details]
> proposed patch
> 
> Andrew is exactly right re plus being special.
> 
> The pluslong hoops that are being jumped through are really the
> domain for post-reload splitters and peepholes.

Thanks for the patch, it fixes the reported issue. However it regresses the
examples that motivated the original patch:

int g3(int x, int y, int z)
{
  x += 0x1;
  y += 0x1;
  z += 0x1;
  return x * y ^ z;
}

int g4(int x, int y, int z)
{
  x += 0x1;
  return x;
}

These now produce:

g3:
mov w3, 21845
mov w4, 21845
movkw3, 0x1, lsl 16
movkw4, 0x1, lsl 16
add w0, w0, w3
add w1, w1, w4
mov w5, 21845
movkw5, 0x1, lsl 16
mul w0, w0, w1
add w2, w2, w5
eor w0, w0, w2
ret

g4:
mov w1, 21845
movkw1, 0x1, lsl 16
add w0, w0, w1
ret

The first case should have CSEd the complex immediate. The 2nd should have
split into 2 adds.

The idea of the original patch was to expand single-instruction adds, so any
complex immediates are explicit and get CSEd/lifted. Then allow combine to find
single-use single-block complex immediates that can be merged with an add and
split into a 2-instruction add expansion. This only works if you have a
define_insn_and_split for the complex immediate, with just a split combine
doesn't do it.

With your patch expand always emits add instructions with complex immediates
which then can't be optimized.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-07 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-01-07
 Ever confirmed|0   |1

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-07 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

Andrew Pinski  changed:

   What|Removed |Added

 CC||jgreenhalgh at gcc dot gnu.org

--- Comment #5 from Andrew Pinski  ---
This was introduced by:
2015-11-24  Wilco Dijkstra  

* config/aarch64/aarch64.md (add3):
Block early expansion into 2 add instructions.
(add3_pluslong): New pattern to combine complex
immediates into 2 additions.


2015-12-04  James Greenhalgh  

* config/aarch64/aarch64.md (add3_pluslong): Add register
constraints.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-07 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

--- Comment #8 from Andrew Pinski  ---
(In reply to Wilco from comment #7)
> > > I think the problem is the constraints on *add3_pluslong allows all 
> > > immediates.
> > 
> > I'm not sure what you mean here - there are 4 constraints that should all be
> > true before the instruction is matched: GPI, aarch64_pluslong_immediate, 'i'
> > and "!aarch64_plus_operand (operands[2], VOIDmode) && !aarch64_move_imm
> > (INTVAL (operands[2]), mode)".
> 
> It appears reload creates the instruction without doing the last check -
> this is incorrect as it might be an instruction that is disabled (eg. not
> supported by the selected architecture)...
> 
> However a trivial workaround is to always expand the pattern by changing the
> "&& true" into "true". I'll post a patch.

plus patterns are special to reload (this is documented IIRC). So I think there
should be only one plus pattern for DI mode.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-07 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

--- Comment #9 from Wilco  ---
(In reply to Andrew Pinski from comment #8)
> (In reply to Wilco from comment #7)
> > > > I think the problem is the constraints on *add3_pluslong allows 
> > > > all immediates.
> > > 
> > > I'm not sure what you mean here - there are 4 constraints that should all 
> > > be
> > > true before the instruction is matched: GPI, aarch64_pluslong_immediate, 
> > > 'i'
> > > and "!aarch64_plus_operand (operands[2], VOIDmode) && !aarch64_move_imm
> > > (INTVAL (operands[2]), mode)".
> > 
> > It appears reload creates the instruction without doing the last check -
> > this is incorrect as it might be an instruction that is disabled (eg. not
> > supported by the selected architecture)...
> > 
> > However a trivial workaround is to always expand the pattern by changing the
> > "&& true" into "true". I'll post a patch.
> 
> plus patterns are special to reload (this is documented IIRC). So I think
> there should be only one plus pattern for DI mode.

I see, that would explain why it doesn't check the condition. Merging into a
single instruction should be possible. That still wouldn't check the condition
of course, so we'd need to split anything that isn't a single add (given this
odd reload case is so rare, it's not an issue).

There is also a secondary issue in that the large number of reloads in the
example generate quite inefficient code. With a few tweaks I got a 5% reduction
in instruction count.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-06 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

--- Comment #2 from Andrew Pinski  ---
Note -g is not needed to reproduce the bug and speeds up the compiling a lot.

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-06 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

Andrew Pinski  changed:

   What|Removed |Added

 CC||pinskia at gcc dot gnu.org
   Target Milestone|--- |6.0

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-06 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176

--- Comment #1 from Andrew Pinski  ---
Happens with r231970 also.