[Bug target/82989] [7/8 regression] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #18 from sudi at gcc dot gnu.org --- Created bug 84467 to continue discussions about the early expand phase decisions to choose or reject NEON operations
[Bug target/82989] [7/8 regression] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #17 from sudi at gcc dot gnu.org --- Since this looks like a pretty invasive problem, according to my discussions with Wilco and Kyrill, I think I will try to propose a smaller, but temporary fix using the ?s and special casing 32 for this PR (which could go in sooner). I will also open a new PR to handle this at the expand phase and clean up the code aimed at gcc 9.
[Bug target/82989] [7/8 regression] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #16 from Wilco --- (In reply to Matthijs van Duin from comment #15) > (In reply to Wilco from comment #14) > > Yes on older cores it can be a bad idea to allow accidental use of Neon > > instructions. The simplest workaround is to switch off Neon, just use > > -mfpu=vfp. > > Sure, though that's not exactly ideal since it is very desirable for the > Neon unit to be used for single-precision floats. (9-10 cycles for add/sub > in VFP, 10-12 cycles for mul in VFP, 1 cycle for two add/sub/mul ops in Neon) That's why it's a workaround until the bug is fixed - assuming the performance critical functions use both 64-bit integer and 32-bit fp operations, you could benchmark both options and choose the one that works best for now.
[Bug target/82989] [7/8 regression] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #15 from Matthijs van Duin --- (In reply to Wilco from comment #14) > Yes on older cores it can be a bad idea to allow accidental use of Neon > instructions. The simplest workaround is to switch off Neon, just use > -mfpu=vfp. Sure, though that's not exactly ideal since it is very desirable for the Neon unit to be used for single-precision floats. (9-10 cycles for add/sub in VFP, 10-12 cycles for mul in VFP, 1 cycle for two add/sub/mul ops in Neon)
[Bug target/82989] [7/8 regression] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #14 from Wilco --- (In reply to Matthijs van Duin from comment #13) > In case it's of interest, I did a quick benchmark of my testcase executed in > a loop on a cortex-a8: > > Without neon: > 12 instructions/iteration > 14 cycles/iteration > > With neon: > 14 instructions/iteration > 35.2-35.3 cycles/iteration > > (This includes 4 instructions for the loop itself.) > > When using neon, the majority of the time is spent in a nasty pipeline stall > for moving data from neon registers to arm registers, which takes a minimum > of 20 cycles according to the cortex-a8 TRM. Yes on older cores it can be a bad idea to allow accidental use of Neon instructions. The simplest workaround is to switch off Neon, just use -mfpu=vfp. We probably also need to block the register allocator from spilling integer registers to the FP register file as that would have the same stall (another thing it really seems to insist on for some odd reason). There are AArch64 patches for this that could be ported to Arm.
[Bug target/82989] [7/8 regression] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #13 from Matthijs van Duin --- In case it's of interest, I did a quick benchmark of my testcase executed in a loop on a cortex-a8: Without neon: 12 instructions/iteration 14 cycles/iteration With neon: 14 instructions/iteration 35.2-35.3 cycles/iteration (This includes 4 instructions for the loop itself.) When using neon, the majority of the time is spent in a nasty pipeline stall for moving data from neon registers to arm registers, which takes a minimum of 20 cycles according to the cortex-a8 TRM.
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #12 from Wilco --- (In reply to Jakub Jelinek from comment #11) > From my completely ARM unaware POV, if NEON is available, it isn't a strict > requirement that NEON must never be used for this, just a matter of > preferences. So perhaps one or two further ?s on the avoid_neon_for_64bits > alternatives to let the RA know it should prefer the GPR alternative which > already contains a single ? and therefore right now is equal preference to > the avoid_neon_for_64bits. The costing model for preferencing is extremely complex and suboptimal. There are too many bugs in it, it all appears to be written for a CISC ISA (eg. it assumes it can just replace any register operand with a MEM at no extra cost, even if such a pattern doesn't exist). So hacking in some more ?'s is never going to result in good code. The issue is also made more complex by the fact that expanding 64-bit operations after register allocation is an extremely bad idea - see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308. So today the choice is between generating inefficient ARM code or inefficient Neon code... By making this a hard choice early on (only generate ARM, or use Neon when beneficial) we can actually generate high quality code. I've got patches that give huge gains and still allow a user to prefer Neon instructions with -mneon-for-64bits.
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 Jakub Jelinek changed: What|Removed |Added CC||vmakarov at gcc dot gnu.org --- Comment #11 from Jakub Jelinek --- >From my completely ARM unaware POV, if NEON is available, it isn't a strict requirement that NEON must never be used for this, just a matter of preferences. So perhaps one or two further ?s on the avoid_neon_for_64bits alternatives to let the RA know it should prefer the GPR alternative which already contains a single ? and therefore right now is equal preference to the avoid_neon_for_64bits.
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #10 from Wilco --- The only solution to this is to make a hard choice between Neon and integer registers before register allocation.
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 sudi at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #9 from Matthijs van Duin --- I can still reproduce the same test case with: arm-linux-gnueabihf-gcc-8 (Debian 8-20180207-2) 8.0.1 20180207 (experimental) [trunk revision 257435] -mfloat-abi=hard is implicit for arm-linux-gnueabihf, although overriding that to -mfloat-abi=softfp produces exactly the same result for me. (-mfloat-abi=soft completely disables all use of VFP/Neon hence obviously suppresses this problem.)
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #8 from Jakub Jelinek --- In both adddi3_neon and lshrdi3_neon the GPR alternatives use exactly the same number of ?s as the NEON ones with avoid_neon_for_64bits, just the neon_for_64bits alternatives don't have any. So I don't see how it is a preference of not using NEON, when TARGET_PREFER_NEON_64BITS is true, neon is clearly preferred, otherwise there is no preferencing at all. And indeed, for DImode shifts by 32 the GPR regs should be prefered. Anyway, leaving this to ARM maintainers, don't know about the arch enough to do something.
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #7 from Jakub Jelinek --- Indeed, with -march=armv7-a -mcpu=cortex-a8 -mfpu=neon -O2 pr82989.c -mfloat-abi=hard -mprint-tune-info I can reproduce. prefer_neon_for_64bits seems to be false, and LRA first says: Choosing alt 4 in insn 7: (0) ?&r (1) r (2) i (3) X (4) X (5) X {lshrdi3_neon} which I assume would be with GPRs, but then decides to pick: Choosing alt 6 in insn 7: (0) ?w (1) w (2) i (3) X (4) X (5) X {lshrdi3_neon} instead. That alternative has avoid_neon_for_64bits arch, even when it uses neon registers, just has ? to disparage it slightly.
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #6 from ktkachov at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #5) > Just double-checked this on current trunk and I still can't reproduce this. > Nor with r248000, r245631 nor 244212 I have around (all crosses). > > If anyone can reproduce it, can you please have a look at it? It needs an -mfloat-abi=hard to reproduce.
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 Jakub Jelinek changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org, ||nickc at gcc dot gnu.org, ||rearnsha at gcc dot gnu.org --- Comment #5 from Jakub Jelinek --- Just double-checked this on current trunk and I still can't reproduce this. Nor with r248000, r245631 nor 244212 I have around (all crosses). If anyone can reproduce it, can you please have a look at it?
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 Richard Biener changed: What|Removed |Added Target Milestone|7.3 |7.4 --- Comment #4 from Richard Biener --- GCC 7.3 is being released, adjusting target milestone.
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- Can't reproduce with current trunk (admittedly cross from x86_64-linux): ./cc1 -quiet -nostdinc -march=armv7-a -mcpu=cortex-a8 -mfpu=neon -O2 pr82989.c ldrdr2, [r0] mov r1, #0 addsr2, r3, r2 adc r3, r1, r3 strdr2, [r0] bx lr
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 Richard Biener changed: What|Removed |Added Target Milestone|--- |7.3
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 --- Comment #2 from Ramana Radhakrishnan --- Works as expected in GCC 6.
[Bug target/82989] [7/8 regression ] Inexplicable use of NEON for 64-bit math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82989 Ramana Radhakrishnan changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-11-23 CC||ramana at gcc dot gnu.org Summary|Inexplicable use of NEON|[7/8 regression ] |for 64-bit math |Inexplicable use of NEON ||for 64-bit math Ever confirmed|0 |1 Known to fail||7.2.1, 8.0 --- Comment #1 from Ramana Radhakrishnan --- Confirmed.