[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537 Andrew Pinski changed: What|Removed |Added Status|RESOLVED|NEW Resolution|FIXED |--- Target Milestone|9.0 |--- --- Comment #8 from Andrew Pinski --- Well it was just this case that was fixed. here is another one which is still broken: unsigned int adds_shift_ext ( unsigned long long a, unsigned short b, unsigned c) { unsigned long long d = (a - ((unsigned long long)b << 3)); if (d == 0) return a + c + b; else return b + d + c; } Note I think there is a missed reassociation/code hoisting too. [local count: 536870913]: _3 = (unsigned int) a_11(D); _4 = _3 + c_13(D); _15 = _4 + _8; goto ; [100.00%] [local count: 536870913]: _7 = (unsigned int) d_12; _17 = _8 + c_13(D); _14 = _7 + _17; c_13(D) + _8 is full redundant here
[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |9.0 Status|NEW |RESOLVED --- Comment #7 from Andrew Pinski --- GCC9+ does: subsx3, x0, x1, sxth 3 add w1, w2, w1, sxth add w1, w1, w3 add w0, w2, w0 cselw0, w1, w0, ne ret GCC 8 produced: sxthw1, w1 subsx3, x0, x1, sxth 3 add w1, w1, w2 add w1, w1, w3 add w0, w2, w0 cselw0, w1, w0, ne ret GCC 9's combine is able to do this: Trying 3 -> 8: 3: r99:SI=sign_extend(x1:HI) REG_DEAD x1:HI 8: r101:DI=sign_extend(r99:SI#0) Failed to match this instruction: (parallel [ (set (reg:DI 101 [ b ]) (sign_extend:DI (reg:HI 1 x1 [ b ]))) (set (reg/v:SI 99 [ b ]) (sign_extend:SI (reg:HI 1 x1 [ b ]))) ]) Failed to match this instruction: (parallel [ (set (reg:DI 101 [ b ]) (sign_extend:DI (reg:HI 1 x1 [ b ]))) (set (reg/v:SI 99 [ b ]) (sign_extend:SI (reg:HI 1 x1 [ b ]))) ]) Successfully matched this instruction: (set (reg/v:SI 99 [ b ]) (sign_extend:SI (reg:HI 1 x1 [ b ]))) Successfully matched this instruction: (set (reg:DI 101 [ b ]) (sign_extend:DI (reg:HI 1 x1 [ b ]))) allowing combination of insns 3 and 8 original costs 4 + 4 = 8 replacement costs 4 + 4 = 8 So fixed by r9-2064.
[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537 Ramana Radhakrishnan ramana at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-01-28 CC||ramana at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #6 from Ramana Radhakrishnan ramana at gcc dot gnu.org --- (In reply to kugan from comment #5) Is this sort of multiple-use potential candidate for ree pass? Haven't looked ree in detail yet. IIRC REE can deal with multiple-use potential candidates, however it probably needs some work to look into the complex zero/sign-extension patterns that the AArch64 port has. This is certainly not 5.0 material though now :)
[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537 --- Comment #5 from kugan at gcc dot gnu.org --- Is this sort of multiple-use potential candidate for ree pass? Haven't looked ree in detail yet.
[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537 kugan at gcc dot gnu.org changed: What|Removed |Added CC||kugan at gcc dot gnu.org --- Comment #1 from kugan at gcc dot gnu.org --- According to AAPCS64 (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf), the unused parm register bits have unspecified value.So I think it is needede.
[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537 --- Comment #2 from Andrew Pinski pinskia at gcc dot gnu.org --- (In reply to kugan from comment #1) According to AAPCS64 (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/ IHI0055C_beta_aapcs64.pdf), the unused parm register bits have unspecified value.So I think it is Needed It is not needed because the next instruction has a sign extend and the other uses of the result of the sign extend only use the lower 32bits of the register.
[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537 --- Comment #3 from kugan at gcc dot gnu.org --- But isn't w1 is passed with 16bit value (short b) here. Am I missing something here?
[Bug rtl-optimization/64537] Aarch64 redundant sxth instruction gets generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64537 --- Comment #4 from Richard Earnshaw rearnsha at gcc dot gnu.org --- b is used twice, once shifted left by 3 and once directly. We could write this as subsx3, x0, x1, sxth 3 beq .L5 add w0, w2, w1, sxth = Now extended add w0, w0, w3 ret .p2align 2 .L5: add w0, w2, w0 ret which in this specific case would perhaps be more efficient, but in practice it's quite hard to get this sort of multiple-use right. I think this is a special case, however, of the more common 'un-cse' type of problem, where multiple uses of an extended (or shifted) value are always commoned up. Note that modern CPUs may take an extra cycle to perform an ALU-with-shift type operation, eliminating the benefit of sinking multiple uses down into the ALU operations themselves.