[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 --- Comment #13 from Jakub Jelinek --- (In reply to ktkachov from comment #12) > r223115 caused a code quality regression that I'm seeing on aarch64 with PR > 68381. > > We have > int > foo (unsigned short x, unsigned short y) > { > int r; > if (__builtin_mul_overflow (x, y, )) > __builtin_abort (); > return r; > } > > and before the commit we used to generate: > foo: > uxthx0, w0 > uxthx1, w1 > mul x0, x0, x1 > cmp x0, x0, sxtw > bne .L9 > ret > .L9: > stp x29, x30, [sp, -16]! > add x29, sp, 0 > bl abort > > but after r223115 at -O3 on an aarch64 toolchain we generate: > foo: > uxthw0, w0 > uxthw1, w1 > stp x29, x30, [sp, -16]! > umull x0, w0, w1 > add x29, sp, 0 > tbnzw0, #31, .L6 > mov w2, 0 > cbnzw2, .L6 > ldp x29, x30, [sp], 16 > ret > .L6: > bl abort You really should consider adding {u,}{add,sub,mul}v4 and/or negv3 expanders if you have some better sequences you can generate over the generic ones on some target.
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 ktkachov at gcc dot gnu.org changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org --- Comment #12 from ktkachov at gcc dot gnu.org --- r223115 caused a code quality regression that I'm seeing on aarch64 with PR 68381. We have int foo (unsigned short x, unsigned short y) { int r; if (__builtin_mul_overflow (x, y, )) __builtin_abort (); return r; } and before the commit we used to generate: foo: uxthx0, w0 uxthx1, w1 mul x0, x0, x1 cmp x0, x0, sxtw bne .L9 ret .L9: stp x29, x30, [sp, -16]! add x29, sp, 0 bl abort but after r223115 at -O3 on an aarch64 toolchain we generate: foo: uxthw0, w0 uxthw1, w1 stp x29, x30, [sp, -16]! umull x0, w0, w1 add x29, sp, 0 tbnzw0, #31, .L6 mov w2, 0 cbnzw2, .L6 ldp x29, x30, [sp], 16 ret .L6: bl abort
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 --- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org --- Author: jakub Date: Wed May 13 08:07:58 2015 New Revision: 223115 URL: https://gcc.gnu.org/viewcvs?rev=223115root=gccview=rev Log: PR target/66112 * internal-fn.c (get_min_precision): Use UNSIGNED instead of SIGNED to get precision of non-negative value. * gcc.target/i386/pr66112-1.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr66112-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/internal-fn.c trunk/gcc/testsuite/ChangeLog
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 --- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org --- Author: jakub Date: Wed May 13 08:09:01 2015 New Revision: 223116 URL: https://gcc.gnu.org/viewcvs?rev=223116root=gccview=rev Log: PR target/66112 * config/i386/i386.md (mulvmode4, umulvmode4, *umulvmode4): Use SWI248 iterator instead of SWI. (*mulvmode4_1): Use SWI48 instead of SWI. Simplify output template. Use eq_attr alternative 0 instead of match_test in length_immediate attribute computation. (*mulvhi4, *mulvhi4_1): New define_insns. * gcc.target/i386/pr66112-2.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr66112-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/testsuite/ChangeLog
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org --- Fixed on the trunk. Not a regression, so probably not suitable for backporting.
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org --- Created attachment 35522 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35522action=edit gcc5-pr66112.patch Supposedly just using SWI248 instead of SWI48 iterator should fix this, though not sure about all the AMD scheduling stuff.
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 --- Comment #2 from Uroš Bizjak ubizjak at gmail dot com --- (In reply to Jakub Jelinek from comment #1) Created attachment 35522 [details] gcc5-pr66112.patch Supposedly just using SWI248 instead of SWI48 iterator should fix this, though not sure about all the AMD scheduling stuff. Just copy HImode patterns, as is already case with QI, SI/DImode (and we tolerated it). We can macroize these patterns latter, if needed at all. I wonder, why HImode patterns were left out in the first place ...
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Attachment #35522|0 |1 is obsolete|| Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2015-05-12 Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org --- Created attachment 35523 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35523action=edit gcc6-pr66112-1.patch Generic fix, to improve minimum precision computation for unsigned values.
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 --- Comment #7 from Jeremy gcc.hall at gmail dot com --- Comment on attachment 35522 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35522 gcc5-pr66112.patch Done, PR66120
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 --- Comment #3 from Jeremy gcc.hall at gmail dot com --- Related FYI, Few instructions on ARM set the overflow flag. Two that do are 32-bit add and subtract. For these two, GCC could just emit adds followed by bvs Instead it produces:- bl atoi@ add r1, r4, r0 @ tmp121, a, b cmp r0, #0 @ b, blt .L4 @, cmp r1, r4 @ tmp121, a bge .L5 @, b .L3 @ .L4: cmp r1, r4 @ tmp121, a ble .L5 @,
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 --- Comment #8 from Uroš Bizjak ubizjak at gmail dot com --- (In reply to Jakub Jelinek from comment #5) Created attachment 35524 [details] gcc6-pr66112-2.patch And i386 mulvhi4 and umulvhi4 support. For umulvhi4, I haven't found corresponding i386.md instruction that would emit mul{w}, so just changed SWI48 to SWI248 iterator in that case. No, mulw outputs to %ax:%dx pair of HImode registers! Please see [1] [1] http://x86.renejeschke.de/html/file_module_x86_id_210.html
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org --- Created attachment 35524 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35524action=edit gcc6-pr66112-2.patch And i386 mulvhi4 and umulvhi4 support. For umulvhi4, I haven't found corresponding i386.md instruction that would emit mul{w}, so just changed SWI48 to SWI248 iterator in that case.
[Bug target/66112] __builtin_mul_overflow for int16_t emits poor code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66112 --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org --- As for ARM, please file a separate enhancement request. The generic code has the means for backends to provide better patterns, but arm doesn't use them (only i386.md uses them right now), so arm maintainers should add those if needed.