[Bug tree-optimization/102486] __builtin_popcount(y&-y) is not optimized to 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102486 Luc Van Oostenryck changed: What|Removed |Added CC||luc.vanoostenryck at gmail dot com --- Comment #1 from Luc Van Oostenryck --- when y != 0
[Bug rtl-optimization/100377] needless stack adjustment when passing struct in register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100377 --- Comment #3 from Luc Van Oostenryck --- > I thought there was one which I filed which is much older than those but I > can't find it. Probably also related to PR36409 and PR49157
[Bug rtl-optimization/100378] New: [Regression 9/10/11/12] arm64: lsl + asr used instead of sxth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100378 Bug ID: 100378 Summary: [Regression 9/10/11/12] arm64: lsl + asr used instead of sxth Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: luc.vanoostenryck at gmail dot com Target Milestone: --- Created attachment 50727 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50727=edit testcase On arm64, when compiling with optimization, for example with -O2, the following code: struct sh { short a; short b; short y[2]; }; int fooh(struct sh s) { return s.a; } produces the following assembly code since GCC9.x: fooh: lsl x0, x0, 16 asr w0, w0, 16 ret but with GCC8.x and before it produces the shorter: fooh(sh): sxthw0, w0 ret See https://gcc.godbolt.org/z/YrW7E3cro
[Bug rtl-optimization/100377] New: needless stack adjustment when passing struct in register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100377 Bug ID: 100377 Summary: needless stack adjustment when passing struct in register Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: luc.vanoostenryck at gmail dot com Target Milestone: --- Created attachment 50726 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50726=edit testcases When compiling with optimization for example -O2), the following code: struct sb { signed char a; char b; short y[3]; }; struct ub { unsigned char a; char b; short y[3]; }; int fsb(struct sb s) { return s.a; } int fub(struct ub s) { return s.a; } produces the following assembly code on arm64: fsb: sub sp, sp, #16 sxtbw0, w0 add sp, sp, 16 ret fub: sub sp, sp, #16 and w0, w0, 255 add sp, sp, 16 ret the following on mips64: fsb: daddiu $sp,$sp,-16 dsll$2,$4,56 dsra$2,$2,56 j $31 daddiu $sp,$sp,16 fub: daddiu $sp,$sp,-16 andi$2,$4,0xff j $31 daddiu $sp,$sp,16 the following on riscv64: fsb: addisp,sp,-16 sllia0,a0,24 sraia0,a0,24 addisp,sp,16 jr ra fub: addisp,sp,-16 andia0,a0,0xff addisp,sp,16 jr ra OTOH, things seems OK on ppc64: fsb: extsb 3,3 blr fub: rlwinm 3,3,0,0xff blr and x86_64: fsb: movsx eax, dil ret fub: movzx eax, dil ret Similar problems happen on 32-bit platforms too. For example on arm32, the following code: struct ub32 { unsigned char a; char b; short y[1]; }; int fub32(struct ub32 s) { return s.a; } produces: fub32: sub sp, sp, #8 uxtbr0, r0 add sp, sp, #8 bx lr All these seem to happen on all versions. See https://gcc.godbolt.org/z/x9zc1EnYn Note: similar PRs exist but reported for x86_64 only
[Bug target/100075] [9/10 Regression] unneeded sign extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100075 --- Comment #4 from Luc Van Oostenryck --- (In reply to Jakub Jelinek from comment #3) > Fixed on the trunk. Probably shouldn't be backported. Work great here. Thanks.
[Bug target/100056] [9/10 Regression] orr + lsl vs. [us]bfiz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100056 --- Comment #11 from Luc Van Oostenryck --- Works nicely now. Thank you.
[Bug target/100028] [9/10 Regression] arm64 failure to generate bfxil
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100028 --- Comment #8 from Luc Van Oostenryck --- Woks nicely now. Thanks
[Bug target/100075] New: [9/10/11 Regression] unneeded sign extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100075 Bug ID: 100075 Summary: [9/10/11 Regression] unneeded sign extension Product: gcc Version: 10.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luc.vanoostenryck at gmail dot com Target Milestone: --- Target: aarch64 Created attachment 50588 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50588=edit test case Until gcc8, the following code: struct s { short x, y; }; struct s rot(struct s p) { return (struct s) { -p.y, p.x }; } was translated: rot90: neg w1, w0, asr 16 and w1, w1, 65535 orr w0, w1, w0, lsl 16 ret but since gcc9 it translates less nicely, with an unneeded sign extension: rot90: mov w1, w0 sbfxx0, x1, 16, 16 neg w0, w0 bfi w0, w1, 16, 16 ret See with another variant in attachment or https://gcc.godbolt.org/z/1oW1cEMGc
[Bug target/100072] New: [10/11 Regression] csel vs. csetm + and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100072 Bug ID: 100072 Summary: [10/11 Regression] csel vs. csetm + and Product: gcc Version: 10.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luc.vanoostenryck at gmail dot com Target Milestone: --- Target: aarch64 Created attachment 50587 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50587=edit testcase The following code: int sel_andn(int p, int a) { return (p ? ~0 : 0) & a; } int sel_andr(int p, int a) { return (p ? 0 : ~0) & a; } translated to the following with GCC9 and before: sel_andn: cmp w0, 0 cselw0, w1, wzr, ne ret sel_andr: cmp w0, 0 cselw0, w1, wzr, eq ret but since version 10 it translates into: sel_andn: cmp w0, 0 csetm w0, ne and w0, w0, w1 ret sel_andr: cmp w0, 0 csetm w0, eq and w0, w0, w1 ret Same at https://gcc.godbolt.org/z/16fj1EYhx
[Bug target/100056] [9/10/11 Regression] orr + lsl vs. [us]bfiz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100056 --- Comment #7 from Luc Van Oostenryck --- Created attachment 50585 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50585=edit newer testcases (with 32 -> 64-bit extensions)
[Bug target/100056] [9/10/11 Regression] orr + lsl vs. [us]bfiz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100056 --- Comment #6 from Luc Van Oostenryck --- (In reply to Jakub Jelinek from comment #3) > Created attachment 50583 [details] > gcc11-pr100056.patch > > Untested fix. OTOH, for the signed case things seems to be OK unless the sign extension is one of the register sizes (8, 16 & 32). See the updated testcases in attachment.
[Bug target/100056] [9/10/11 Regression] orr + lsl vs. [us]bfiz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100056 --- Comment #5 from Luc Van Oostenryck --- Created attachment 50584 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50584=edit updated test cases
[Bug target/100056] [9/10/11 Regression] orr + lsl vs. [us]bfiz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100056 --- Comment #4 from Luc Van Oostenryck --- (In reply to Jakub Jelinek from comment #3) > Created attachment 50583 [details] > gcc11-pr100056.patch > > Untested fix. Mmmm, that's working fine for the cases I had but not in more general cases. I think that the constraint on the AND may be too tight. For example, changing things slightly to have a smaller mask: int or_lsl_u3(unsigned i) { i &= 7; return i | (i << 11); } still gives: or_lsl_u3: and w1, w0, 7 ubfiz w0, w0, 11, 3 orr w0, w0, w1 ret while GCC8 gave the expected: or_lsl_u3: and w0, w0, 7 orr w0, w0, w0, lsl 11 ret In fact, I would tend to think that the AND part should be removed from your split pattern (some kind of zero-extension seems to be needed to reproduce the problem but that's all).
[Bug target/100056] New: [9/10/11 Regression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100056 Bug ID: 100056 Summary: [9/10/11 Regression] Product: gcc Version: 9.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luc.vanoostenryck at gmail dot com Target Milestone: --- Target: aarch64 Created attachment 50573 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50573=edit or-shift vs. [us]bfiz On arm64, the following code: unsigned or_shift(unsigned char i) { return i | (i << 11); } translate to the following assembly: or_shift: and w1, w0, 255 ubfiz w0, w0, 11, 8 orr w0, w0, w1 ret where the ubfiz instruction is a bit weird since the code matches directly what was generated in gcc 8.x and before: or_shift: and w0, w0, 255 orr w0, w0, w0, lsl 11 ret Same with a signed argument (see https://gcc.godbolt.org/z/af4zffMYa ).
[Bug target/100028] [9/10/11 Regression] arm64 failure to generate bfxil
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100028 --- Comment #5 from Luc Van Oostenryck --- (In reply to Jakub Jelinek from comment #4) > Created attachment 50571 [details] > gcc11-pr100028.patch > > Untested fix. This solve the few cases I had. Thanks.
[Bug rtl-optimization/100046] New: compare with itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100046 Bug ID: 100046 Summary: compare with itself Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: luc.vanoostenryck at gmail dot com Target Milestone: --- Created attachment 50569 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50569=edit compare with itself The attached file reproduced here: int b3_06(int x, int y, int z) { int a = (x | z) ^ (y | z); int b = (x ^ y) & ~z; return a == b; } The generated assembly for for arm64 is: b3_06: eor w3, w1, w0 bic w3, w3, w2 cmp w3, w3 csetw0, eq ret So, GCC is able to see that both expressions are equivalent. Nice. But then there is this compare with itself :( The problem seems to exist forever on all targets (see https://gcc.godbolt.org/z/qrYWsznof ).
[Bug target/100028] New: arm64 failure to generate bfxil
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100028 Bug ID: 100028 Summary: arm64 failure to generate bfxil Product: gcc Version: 10.2.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luc.vanoostenryck at gmail dot com Target Milestone: --- Target: aarch64 Created attachment 50555 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50555=edit should generate bfxil but doesn't The attached code reproduced here: #define W 3 #define L 11 int bfxil(int d, int s) { int wmask = (1 << W) - 1; return (d & ~wmask) | ((s >> L) & wmask); } Should return: bfxil: bfxil w0, w1, 11, 3 ret but instead returns: bfxil: ubfxx1, x1, 11, 3 and w0, w0, -8 orr w0, w1, w0 ret The problem is still present in trunk, was also present in 9.3 but wasn't in GCC 8.2 (see https://gcc.godbolt.org/z/E6z31hr9r ).
[Bug c/92935] typeof() on an atomic type doesn't always return the corresponding unqualified type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92935 Luc Van Oostenryck changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #5 from Luc Van Oostenryck --- The incoherence is now fixed with thanks to commit r11-5397-g768ce4f0ceb030e38427e85e483ed44330cd5da7