[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #19 from Matthias Klose --- here is a test case from an ICE I saw when backporting the patch to the gcc-5 Linaro branch. Fails with -O2, works with -O1 typedef int Nlm_Int4, ValNodePtr; Nlm_Int4 b, e; char c, d; void fn1(); typedef struct { double patternProbability; } patternSearchItems; char fn2(); void fn3(); patternSearchItems a; void fn4() { Nlm_Int4 f[2]; char g[1]; a = *(patternSearchItems *)fn3; while (fn2(c, d & e, b)) if (a.patternProbability) { fn1(g); fn1(f); } }
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #17 from Richard Henderson --- Author: rth Date: Mon Jan 18 20:56:13 2016 New Revision: 232540 URL: https://gcc.gnu.org/viewcvs?rev=232540=gcc=rev Log: PR target/69176 * config/aarch64/aarch64.md (add3): Move long immediate operands to pseudo only if CSE is expected. Split long immediate operands only after reload, and for the stack pointer. (*add3_pluslong): Remove. (*addsi3_aarch64, *adddi3_aarch64): Merge into... (*add3_aarch64): ... here. Add r/rk/Upl alternative. (*addsi3_aarch64_uxtw): Add r/rk/Upl alternative. (*add3 peepholes): New. (*add3 splitters): New. * config/aarch64/constraints.md (Upl): New. * config/aarch64/predicates.md (aarch64_pluslong_strict_immedate): New. Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64.md trunk/gcc/config/aarch64/constraints.md trunk/gcc/config/aarch64/predicates.md
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 Richard Henderson changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #18 from Richard Henderson --- Fixed.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 Richard Henderson changed: What|Removed |Added Status|NEW |ASSIGNED CC||rth at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |rth at gcc dot gnu.org --- Comment #10 from Richard Henderson --- Created attachment 37267 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37267=edit proposed patch Andrew is exactly right re plus being special. The pluslong hoops that are being jumped through are really the domain for post-reload splitters and peepholes.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 Richard Henderson changed: What|Removed |Added Attachment #37267|0 |1 is obsolete|| --- Comment #13 from Richard Henderson --- Created attachment 37281 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37281=edit second patch Ok, so the patch didn't survive overnight testing. The primary issue being that stack pointer adjustments need to be split immediately, since try_split will refuse to operate on an instruction that modifies the stack pointer. Thanks for the test cases in #c11, I'd forgotten about exposing this to cse. This adjusted patch handles them as expected. The lack of condition on the peepholes was just a think-o. Testing for single-insn move was always intended, just forgotten. Full testing for this is still a couple hours off completion.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #14 from Richard Henderson --- (In reply to Wilco from comment #12) > The only remaining question I had whether it would be possible to use > peephole expansions rather than the late splits. If they are evaluated in > order then if the first doesn't match or if there is no temporary, the 2nd > one will split into 2 adds. Why would you want to do this as peepholes? I don't know what you're thinking of, but I'm pretty sure it's already set up like you want: peep2 pass changes the insns to use temporaries when possible, then splitting to 2 adds happens for any remaining long constants.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #16 from Richard Henderson --- (In reply to Wilco from comment #15) > The final split happens a few phases later, so I wondered whether it would > be feasible to do all the splitting during peep2. There is likely no real CQ > gain in doing so, it just might make understanding the md file easier. No, it's not, since peep2 doesn't always run. Whereas the splits must be available during final for -O0.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #15 from Wilco --- (In reply to Richard Henderson from comment #14) > (In reply to Wilco from comment #12) > > The only remaining question I had whether it would be possible to use > > peephole expansions rather than the late splits. If they are evaluated in > > order then if the first doesn't match or if there is no temporary, the 2nd > > one will split into 2 adds. > > Why would you want to do this as peepholes? > > I don't know what you're thinking of, but I'm pretty sure > it's already set up like you want: peep2 pass changes the > insns to use temporaries when possible, then splitting to > 2 adds happens for any remaining long constants. The final split happens a few phases later, so I wondered whether it would be feasible to do all the splitting during peep2. There is likely no real CQ gain in doing so, it just might make understanding the md file easier.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #12 from Wilco --- (In reply to Wilco from comment #11) > With your patch expand always emits add instructions with complex immediates > which then can't be optimized. OK, so I can change your patch do the right thing with 2 minor changes: * expand add3 should use aarch64_plus_immediate * the peepholes should have aarch64_move_imm (INTVAL (operands[2]), mode) as the condition (we want to expand into add+add rather than mov+movk+add...). With this the codesize of the example reduces by an extra 8% :-) The only remaining question I had whether it would be possible to use peephole expansions rather than the late splits. If they are evaluated in order then if the first doesn't match or if there is no temporary, the 2nd one will split into 2 adds.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #11 from Wilco --- (In reply to Richard Henderson from comment #10) > Created attachment 37267 [details] > proposed patch > > Andrew is exactly right re plus being special. > > The pluslong hoops that are being jumped through are really the > domain for post-reload splitters and peepholes. Thanks for the patch, it fixes the reported issue. However it regresses the examples that motivated the original patch: int g3(int x, int y, int z) { x += 0x1; y += 0x1; z += 0x1; return x * y ^ z; } int g4(int x, int y, int z) { x += 0x1; return x; } These now produce: g3: mov w3, 21845 mov w4, 21845 movkw3, 0x1, lsl 16 movkw4, 0x1, lsl 16 add w0, w0, w3 add w1, w1, w4 mov w5, 21845 movkw5, 0x1, lsl 16 mul w0, w0, w1 add w2, w2, w5 eor w0, w0, w2 ret g4: mov w1, 21845 movkw1, 0x1, lsl 16 add w0, w0, w1 ret The first case should have CSEd the complex immediate. The 2nd should have split into 2 adds. The idea of the original patch was to expand single-instruction adds, so any complex immediates are explicit and get CSEd/lifted. Then allow combine to find single-use single-block complex immediates that can be merged with an add and split into a 2-instruction add expansion. This only works if you have a define_insn_and_split for the complex immediate, with just a split combine doesn't do it. With your patch expand always emits add instructions with complex immediates which then can't be optimized.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 ktkachov at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-01-07 Ever confirmed|0 |1 --- Comment #3 from ktkachov at gcc dot gnu.org --- Confirmed.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 Andrew Pinski changed: What|Removed |Added CC||jgreenhalgh at gcc dot gnu.org --- Comment #5 from Andrew Pinski --- This was introduced by: 2015-11-24 Wilco Dijkstra* config/aarch64/aarch64.md (add3): Block early expansion into 2 add instructions. (add3_pluslong): New pattern to combine complex immediates into 2 additions. 2015-12-04 James Greenhalgh * config/aarch64/aarch64.md (add3_pluslong): Add register constraints.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #8 from Andrew Pinski --- (In reply to Wilco from comment #7) > > > I think the problem is the constraints on *add3_pluslong allows all > > > immediates. > > > > I'm not sure what you mean here - there are 4 constraints that should all be > > true before the instruction is matched: GPI, aarch64_pluslong_immediate, 'i' > > and "!aarch64_plus_operand (operands[2], VOIDmode) && !aarch64_move_imm > > (INTVAL (operands[2]), mode)". > > It appears reload creates the instruction without doing the last check - > this is incorrect as it might be an instruction that is disabled (eg. not > supported by the selected architecture)... > > However a trivial workaround is to always expand the pattern by changing the > "&& true" into "true". I'll post a patch. plus patterns are special to reload (this is documented IIRC). So I think there should be only one plus pattern for DI mode.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #9 from Wilco --- (In reply to Andrew Pinski from comment #8) > (In reply to Wilco from comment #7) > > > > I think the problem is the constraints on *add3_pluslong allows > > > > all immediates. > > > > > > I'm not sure what you mean here - there are 4 constraints that should all > > > be > > > true before the instruction is matched: GPI, aarch64_pluslong_immediate, > > > 'i' > > > and "!aarch64_plus_operand (operands[2], VOIDmode) && !aarch64_move_imm > > > (INTVAL (operands[2]), mode)". > > > > It appears reload creates the instruction without doing the last check - > > this is incorrect as it might be an instruction that is disabled (eg. not > > supported by the selected architecture)... > > > > However a trivial workaround is to always expand the pattern by changing the > > "&& true" into "true". I'll post a patch. > > plus patterns are special to reload (this is documented IIRC). So I think > there should be only one plus pattern for DI mode. I see, that would explain why it doesn't check the condition. Merging into a single instruction should be possible. That still wouldn't check the condition of course, so we'd need to split anything that isn't a single add (given this odd reload case is so rare, it's not an issue). There is also a secondary issue in that the large number of reloads in the example generate quite inefficient code. With a few tweaks I got a 5% reduction in instruction count.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #2 from Andrew Pinski --- Note -g is not needed to reproduce the bug and speeds up the compiling a lot.
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 Andrew Pinski changed: What|Removed |Added CC||pinskia at gcc dot gnu.org Target Milestone|--- |6.0
[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #1 from Andrew Pinski --- Happens with r231970 also.