[Bug tree-optimization/87621] New: auto-vectorization fails for exponentiation code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87621 Bug ID: 87621 Summary: auto-vectorization fails for exponentiation code Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com Target Milestone: --- https://godbolt.org/z/bgieBT template T pow(T x, unsigned int n) { if (!n) return 1; T y = 1; while (n > 1) { if (n%2) y *= x; x = x*x; // unsupported use in stmt n /= 2; } return x*y; } void testVec(int* x) { // loop nest containing two or more consecutive inner loops cannot be vectorized for (int i = 0; i < 8; ++i) x[i] = pow(x[i], 10); }
[Bug tree-optimization/87621] auto-vectorization fails for exponentiation code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87621 --- Comment #1 from krux --- Interestingly it happily unrolls the loop even with -fno-unroll-loops.
[Bug rtl-optimization/84101] [7/8/9 Regression] -O3 and -ftree-vectorize trying too hard for function returning trivial pair-of-uint64_t-structure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84101 --- Comment #4 from krux --- Also happens with pairs of floats: https://godbolt.org/z/QrP0VD
[Bug tree-optimization/87621] outer loop auto-vectorization fails for exponentiation code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87621 --- Comment #3 from krux --- Yes see the godbolt link. clang compiles it down to a few vpmulld's.
[Bug c++/63149] wrong auto deduction from braced-init-list
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63149 krux changed: What|Removed |Added CC||hoganmeier at gmail dot com --- Comment #3 from krux --- Still fails on trunk.
[Bug lto/90369] New: error: could not unlink output file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90369 Bug ID: 90369 Summary: error: could not unlink output file Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Tested this ARM toolchain: http://www.freddiechopin.info/en/download/category/11-bleeding-edge-toolchain In a very specific case I get the aforementioned error: could not unlink output file yield.cpp: void yield() {} main.cpp: void yield(); int main() { yield(); } arm-none-eabi-g++ -o obj/main.cpp.o -c -flto -g -nostdlib -O2 main.cpp arm-none-eabi-g++ -o obj/yield.cpp.o -c -flto -g -nostdlib -O2 yield.cpp arm-none-eabi-gcc-ar rc obj/libFrameworkArduino.a obj/main.cpp.o obj/yield.cpp.o arm-none-eabi-g++ -o obj/firmware.elf -T empty.ld -Wl,--gc-sections -O2 -save-temps obj/libFrameworkArduino.a arm-none-eabi/bin/ld.exe: error: could not unlink output file If you remove any of the -g or -save-temps flags, or merge the code into 1 file, or use the object files directly it works.
[Bug lto/90369] error: could not unlink output file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90369 --- Comment #4 from krux --- The code was automatically reduced, hence the empty linker script. Looks promising, seems like you found the cause.
[Bug debug/90441] New: [9 regression] corrupt debug info with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90441 Bug ID: 90441 Summary: [9 regression] corrupt debug info with LTO Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: needs-bisection Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com Target Milestone: --- Originally occurred with arm-gcc 9.1 Reproduced it with Ubuntu 19.04 gcc 9.0, works with gcc 8.3. Couldn't reduce it further. mk20dx128.c: __attribute__ ((section(".vectors"), used)) _VectorsFlash[100]= { }; main.cpp: void yield(); int main() { yield(); } yield.cpp: int serial3_available() {} struct HardwareSerial3 { int available() { serial3_available(); } }; HardwareSerial3 Serial3; void yield() { serial3_available(); } script.ld: MEMORY { FLASH (rx) : ORIGIN = 0x, LENGTH = 4K } SECTIONS { .text : { . = 0; KEEP(*(.vectors)) *(.text*) } > FLASH = 0xFF } gcc-9 -o mk20dx128.c.o -c -flto -g -ffunction-sections -fdata-sections -nostdlib -O2 teensy3/mk20dx128.c g++-9 -o main.cpp.o -c -fno-exceptions -fno-rtti -flto -g -ffunction-sections -fdata-sections -nostdlib -O2 teensy3/main.cpp g++-9 -o yield.cpp.o -c -fno-exceptions -fno-rtti -flto -g -ffunction-sections -fdata-sections -nostdlib -O2 yield.cpp g++-9 -o firmware.elf -g -T script.ld -Wl,--gc-sections,--relax -O2 main.cpp.o mk20dx128.c.o yield.cpp.o nm -ClS --radix=d --size-sort firmware.elf 0224 0400 T _VectorsFlashnm: DWARF error: could not find abbrev number 8 If you remove the 'HardwareSerial3 Serial3;' line the error becomes DWARF error: info pointer extends beyond end of attributes
[Bug debug/90441] [9 regression] corrupt debug info with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90441 --- Comment #1 from krux --- Created attachment 46343 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46343&action=edit llvm-dwarfdump --verify output FWIW llvm-dwarfdump --verify shows the same errors for both versions, but for gcc-9 it can't resolve the actual strings in the DW_AT_abstract_origin lines.
[Bug debug/90441] [9 regression] corrupt debug info with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90441 --- Comment #2 from krux --- By the way, with 8.3 there is no DWARF error, but nm -l does not show any file location for _VectorsFlash either.
[Bug driver/90443] New: -flto=n on Windows results in CreateProcess: No such file or directory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90443 Bug ID: 90443 Summary: -flto=n on Windows results in CreateProcess: No such file or directory Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com Target Milestone: --- Just using a dummy source: extern "C" void _start() {} $ arm-none-eabi-g++ -O3 -flto=2 main.cpp -nostdlib -o firmware.elf -v lto-wrapper.exe: fatal error: CreateProcess: No such file or directory Very unhelpful. -v lifts the curtain: make -f Temp\ccwaSVX1.mk -j2 all lto-wrapper.exe: fatal error: CreateProcess: No such file or directory There is no make, esp. in arm-gcc distributions. The error message should be improved.
[Bug debug/90441] [9 regression] corrupt debug info with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90441 --- Comment #3 from krux --- Finally tried qemu+gdb on the original code: gdb-8.2.1/gdb/dwarf2read.c:9715: internal-error: void dw2_add_symbol_to_list(symbol*, pending**): Assertion `(*listhead) == NULL || (SYMBOL_LANGUAGE ((*listhead)->symbol[0]) == SYMBOL_LANGUAGE (symbol))' failed.
[Bug debug/90441] [9/10 Regression] corrupt debug info with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90441 --- Comment #18 from krux --- (In reply to Iain Sandoe from comment #14) > current trunk (27), manual regeneration of the > firmware.elf.ltrans0.ltrans.o -> > > (it's kinda frustrating that one can't see the link line, more tweaks are > still needed to help debug LTO Tell me about it. The first time I tried -save-temps I expected firmware.elf.ltrans0.o to be compiled from firmware.elf.ltrans0.s of course. But it's not, -v shows the .s file is compiled to firmware.elf.ltrans0.ltrans.o and I still don't really know what the other one is. Some commandlines seem to be missing (and it's hard to find them in the verbose output, maybe some color could help) in the verbose output and the temporary files are gone already.
[Bug lto/90523] New: lto1 segfault in arm_parse_cpu_option_name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90523 Bug ID: 90523 Summary: lto1 segfault in arm_parse_cpu_option_name Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Built a bleeding-edge arm-gcc toolchain. It works fine but when I tried newlib built with -flto I got a crash in lto1. $ arm-none-eabi-g++ -o main.elf -Wl,--relax -mthumb -mcpu=cortex-m4 -O3 during IPA pass: icf In function '__retarget_lock_acquire_recursive': lto1: internal compiler error: Segmentation fault #0 __strchr_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:57 #1 0x014de71a in strchr (__c=43, __s=0x0) at /usr/include/string.h:220 #2 arm_parse_cpu_option_name (list=0x1ab3400 , optname=optname@entry=0x18704ba "-mcpu", target=0x0, complain=complain@entry=true) at gcc-10-20190512/gcc/common/config/arm/arm-common.c:349 #3 0x00f8545d in arm_configure_build_target (target=0x1e7b500 , opts=0x7f3e2a00, opts_set=0x1e81100 , warn_compatible=) at gcc-10-20190512/gcc/config/arm/arm.c:3147 #4 0x00fa5b68 in arm_set_current_function (fndecl=) at gcc-10-20190512/gcc/tree.h:3186 #5 0x0097da22 in invoke_set_current_function_hook (fndecl=0x7f402400) at gcc-10-20190512/gcc/function.c:4629 #6 0x00984a48 in invoke_set_current_function_hook (fndecl=0x7f402400) at gcc-10-20190512/gcc/function.c:4788 #7 allocate_struct_function (fndecl=0x7f402400, abstract_p=) at gcc-10-20190512/gcc/function.c:4742 #8 0x00afc5ed in input_function (ib_cfg=0x7ffed9c0, ib=0x7ffed9a0, data_in=0x1f8c510, fn_decl=0x7f402400) at gcc-10-20190512/gcc/lto-streamer-in.c:1066 #9 lto_read_body_or_constructor (file_data=0x7f3ec960, data=, node=, section_type=LTO_section_function_body) at gcc-10-20190512/gcc/lto-streamer-in.c:1296 #10 0x0083d38b in cgraph_node::get_untransformed_body (this=0x7f418708) at gcc-10-20190512/gcc/cgraph.c:3570 #11 0x0144762f in ipa_icf::sem_function::init (this=0x1f61230) at gcc-10-20190512/gcc/cgraph.h:2008 #12 0x01441d12 in ipa_icf::sem_item_optimizer::parse_nonsingleton_classes (this=this@entry=0x1eca870) at gcc-10-20190512/gcc/ipa-icf.c:2776 #13 0x0144d730 in ipa_icf::sem_item_optimizer::execute (this=0x1eca870) at gcc-10-20190512/gcc/ipa-icf.c:2577 #14 0x0144e9b7 in ipa_icf::ipa_icf_driver () at gcc-10-20190512/gcc/ipa-icf.c:3698 #15 ipa_icf::pass_ipa_icf::execute (this=) at gcc-10-20190512/gcc/ipa-icf.c:3745 #16 0x00b777ea in execute_one_pass (pass=0x1ec0940) at gcc-10-20190512/gcc/passes.c:2473 #17 0x00b78517 in execute_ipa_pass_list (pass=0x1ec0940) at gcc-10-20190512/gcc/passes.c:2913 #18 0x007ab461 in do_whole_program_analysis () at gcc-10-20190512/gcc/context.h:48 #19 lto_main () at gcc-10-20190512/gcc/lto/lto.c:628 #20 0x00c472af in compile_file () at gcc-10-20190512/gcc/toplev.c:456 #21 0x0077b1e6 in do_compile () at gcc-10-20190512/gcc/toplev.c:2205 #22 toplev::main (this=this@entry=0x7ffedd86, argc=, argc@entry=24, argv=, argv@entry=0x7ffede88) at gcc-10-20190512/gcc/toplev.c:2340 #23 0x0077d9dc in main (argc=24, argv=0x7ffede88) at gcc-10-20190512/gcc/main.c:39 I'm not sure how to reduce this.
[Bug lto/90523] lto1 segfault in arm_parse_cpu_option_name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90523 --- Comment #1 from krux --- So this one must be null: https://github.com/gcc-mirror/gcc/blob/65af043/gcc/config/arm/arm.c#L3148
[Bug lto/90523] lto1 segfault in arm_parse_cpu_option_name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90523 --- Comment #3 from krux --- Possible, gcc was built with --disable-multilib --with-arch=armv7e-m --with-mode=thumb --with-float=soft. And if I replace -mcpu=cortex-m4 with -march=armv7e-m in my test command there's no crash.
[Bug target/88013] can't vectorize rgb to grayscale conversion code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88013 --- Comment #9 from krux --- (In reply to ktkachov from comment #7) > I tried current trunk (future GCC 9) > GCC 9 learned to avoid excessive widening during vectorisation, which is > what accounts for the large number of instructions you see. Confirmed, the loop is now as described in comment #5 with trunk gcc. Still with vshr+vmovn as mentioned by Ramana. But by the way, the tail is completely unrolled, 15x the following, seems quite excessive to me: ldrbip, [r1, #1]@ zero_extendqisi2 movsr6, #151 ldrblr, [r1]@ zero_extendqisi2 movsr5, #77 ldrbr7, [r1, #2]@ zero_extendqisi2 movsr4, #28 smulbb ip, ip, r6 smlabb lr, r5, lr, ip add ip, r3, #1 smlabb r7, r4, r7, lr cmp ip, r2 asr r7, r7, #8 strbr7, [r0] bge .L1 assert(n >= 16) helps a bit, but n % 16 == 0 doesn't.
[Bug debug/90441] [9/10 Regression] corrupt debug info with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90441 --- Comment #20 from krux --- Thanks your patch worked! Just fyi: llvm-dwarfdump doesn't understand call-site info: https://bugs.llvm.org/show_bug.cgi?id=41846 Not sure if it's relevant.
[Bug debug/90441] [9/10 Regression] corrupt debug info with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90441 --- Comment #22 from krux --- I can also reproduce it without any linker script, simplified code: int serial3_available() {} struct HardwareSerial3 { int available() { serial3_available(); } }; HardwareSerial3 Serial3; void yield() { serial3_available(); } int main() { yield(); } $ g++-9 -c -fno-exceptions -fno-rtti -flto -g -O2 main.cpp $ g++-9 -o firmware.elf -g -O2 main.o $ nm -ClS --radix=d --size-sort firmware.elf 4496 0001 T __libc_csu_fininm: DWARF error: could not find abbrev number 8 00016424 0001 b completed.7374 8192 0004 R _IO_stdin_used 4160 0043 T _start 4400 0093 T __libc_csu_init But other tools are fine in this case: $ llvm-dwarfdump-8 --verify firmware.elf No errors. $ gdb firmware.elf Reading symbols from firmware.elf...
[Bug debug/90441] [9/10 Regression] corrupt debug info with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90441 --- Comment #23 from krux --- But it's so fragile, touch any part of the code and the error disappears. Like change serial3_available to void and you also get an additional symbol: 4160 0003 T mainmain.cpp:8
[Bug debug/90441] [9/10 Regression] corrupt debug info with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90441 --- Comment #24 from krux --- objdump -dCrS also prints these errors. It definitely fails to find the entry for main which is present according to objdump --dwarf: <1>: Abbrev Number: 8 (DW_TAG_subprogram) DW_AT_external: 1 DW_AT_name: (indirect string, offset: 0x1ab): main DW_AT_decl_file : 1 DW_AT_decl_line : 8 DW_AT_decl_column : 5 DW_AT_type: <0xc3>
[Bug c/52981] Separate -Wpadded into two options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52981 krux changed: What|Removed |Added CC||hoganmeier at gmail dot com --- Comment #6 from krux --- (In reply to Manuel López-Ibáñez from comment #4) > This is quite easy to implement. It's not as trivial as one might think. There's some copy-paste code to disable the flag in various places (instead of handling it inside if possible). $ find -name '*.c' | xargs fgrep -nC2 warn_padded ./gcc/config/spu/spu.c-3959- /* We know this is being padded and we want it too. It is an internal ./gcc/config/spu/spu.c-3960- type so hide the warnings from the user. */ ./gcc/config/spu/spu.c:3961: owp = warn_padded; ./gcc/config/spu/spu.c:3962: warn_padded = false; ./gcc/config/spu/spu.c-3963- ./gcc/config/spu/spu.c-3964- layout_type (record); ./gcc/config/spu/spu.c-3965- ./gcc/config/spu/spu.c:3966: warn_padded = owp; ./gcc/config/spu/spu.c-3967- ./gcc/config/spu/spu.c-3968- /* The correct type is an array type of one element. */ -- ./gcc/config/tilegx/tilegx.c-340- /* We know this is being padded and we want it too. It is an ./gcc/config/tilegx/tilegx.c-341- internal type so hide the warnings from the user. */ ./gcc/config/tilegx/tilegx.c:342: owp = warn_padded; ./gcc/config/tilegx/tilegx.c:343: warn_padded = false; ./gcc/config/tilegx/tilegx.c-344- ./gcc/config/tilegx/tilegx.c-345- layout_type (record); ./gcc/config/tilegx/tilegx.c-346- ./gcc/config/tilegx/tilegx.c:347: warn_padded = owp; ./gcc/config/tilegx/tilegx.c-348- ./gcc/config/tilegx/tilegx.c-349- /* The correct type is an array type of one element. */ -- ./gcc/config/tilepro/tilepro.c-292- /* We know this is being padded and we want it too. It is an ./gcc/config/tilepro/tilepro.c-293- internal type so hide the warnings from the user. */ ./gcc/config/tilepro/tilepro.c:294: owp = warn_padded; ./gcc/config/tilepro/tilepro.c:295: warn_padded = false; ./gcc/config/tilepro/tilepro.c-296- ./gcc/config/tilepro/tilepro.c-297- layout_type (record); ./gcc/config/tilepro/tilepro.c-298- ./gcc/config/tilepro/tilepro.c:299: warn_padded = owp; ./gcc/config/tilepro/tilepro.c-300- ./gcc/config/tilepro/tilepro.c-301- /* The correct type is an array type of one element. */ -- ./gcc/fortran/trans-io.c-223- /* -Wpadded warnings on these artificially created structures are not ./gcc/fortran/trans-io.c-224- helpful; suppress them. */ ./gcc/fortran/trans-io.c:225: int save_warn_padded = warn_padded; ./gcc/fortran/trans-io.c:226: warn_padded = 0; ./gcc/fortran/trans-io.c-227- gfc_finish_type (t); ./gcc/fortran/trans-io.c:228: warn_padded = save_warn_padded; ./gcc/fortran/trans-io.c-229- st_parameter[ptype].type = t; ./gcc/fortran/trans-io.c-230-} ./gcc/tree-nested.c-3197- /* In some cases the frame type will trigger the -Wpadded warning. ./gcc/tree-nested.c-3198-This is not helpful; suppress it. */ ./gcc/tree-nested.c:3199: int save_warn_padded = warn_padded; ./gcc/tree-nested.c:3200: warn_padded = 0; ./gcc/tree-nested.c-3201- layout_type (root->frame_type); ./gcc/tree-nested.c:3202: warn_padded = save_warn_padded; ./gcc/tree-nested.c-3203- layout_decl (root->frame_decl, 0); ./gcc/tree-nested.c-3204-
[Bug c++/68901] UBSan triggers false -Wpadded warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68901 krux changed: What|Removed |Added CC||hoganmeier at gmail dot com --- Comment #3 from krux --- Yeah the warning is for an internal data structure, see .Lubsan_data: https://godbolt.org/z/hFo8dZ
[Bug c/52981] Separate -Wpadded into two options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52981 --- Comment #7 from krux --- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68901 is an example of missed -Wpadded suppression.
[Bug target/87076] -mcpu/-march not propagated through LTO bytecode (ice/segfault if arch flags do not match)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87076 krux changed: What|Removed |Added CC||hoganmeier at gmail dot com --- Comment #5 from krux --- *** Bug 90523 has been marked as a duplicate of this bug. ***
[Bug target/90523] lto1 segfault in arm_parse_cpu_option_name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90523 krux changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #4 from krux --- The callstacks are slightly different but probably it's still a duplicate. *** This bug has been marked as a duplicate of bug 87076 ***
[Bug middle-end/82853] Optimize x % 3 == 0 without modulo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 krux changed: What|Removed |Added CC||hoganmeier at gmail dot com --- Comment #34 from krux --- Also fixes the duplicate https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12849. Can't close it though.
[Bug c++/68901] UBSan triggers false -Wpadded warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68901 --- Comment #5 from krux --- Wpadded only checks for input_location != BUILTINS_LOCATION currently (stor-layout.c). Maybe something like !DECL_ARTIFICIAL(rli->t) should be added there.
[Bug c++/68901] UBSan triggers false -Wpadded warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68901 --- Comment #6 from krux --- Created attachment 46434 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46434&action=edit proposed patch
[Bug c++/68901] UBSan triggers false -Wpadded warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68901 --- Comment #7 from krux --- Created attachment 46435 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46435&action=edit cleanup The previous patch should also allow removing these hacks (untested). Though TYPE_ARTIFICIAL wasn't set in any of these cases. Is that normal?
[Bug target/87650] New: suboptimal codegen for testing low bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87650 Bug ID: 87650 Summary: suboptimal codegen for testing low bit Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com Target Milestone: --- int pow(int x, unsigned int n) { int y = 1; while (n > 1) { auto m = n%2; n = n/2; if (m) y *= x; x = x*x; } return x*y; } produces mov edx, esi and edx, 1 test edx, edx instead of just test sil, 1 while clang chooses a branchless version: https://godbolt.org/z/L6VUZ1 Interestingly gcc does use test sil,1 if you get rid of m: godbolt.org/z/9oL1oc Assembly analysis: https://stackoverflow.com/a/52877279/594456
[Bug tree-optimization/87913] New: max(n, 1) code generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87913 Bug ID: 87913 Summary: max(n, 1) code generation Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com Target Milestone: --- unsigned int f(unsigned int num) { return num < 1 ? 1 : num; } int f2(int num) { return num < 1 ? 1 : num; } unsigned int g(unsigned int num) { return num + !num; } $ gcc -O3 f(unsigned int): mov eax, edi testedi, edi mov edx, 1 cmove eax, edx f2(int): testedi, edi mov eax, 1 cmovg eax, edi g(unsigned int): xor eax, eax testedi, edi seteal add eax, edi f and g could be: f: testedi, edi mov eax, 1 cmovne eax, edi g: cmp edi, 1 adc edi, 0 mov eax, edi https://godbolt.org/z/YJWjsQ
[Bug tree-optimization/87914] New: gcc fails to vectorize bitreverse code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87914 Bug ID: 87914 Summary: gcc fails to vectorize bitreverse code Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com Target Milestone: --- $ gcc -fopenmp-simd -O3 -march=haswell -fopt-info-vec-omp-optimized-missed template T reverseBits(T x) { unsigned int s = sizeof(x) * 8; T mask = ~T(0); while ((s >>= 1) > 0) { mask ^= (mask << s); x = ((x >> s) & mask) | ((x << s) & ~mask); // unsupported use in stmt } return x; } void test_reverseBits(unsigned* x) { #pragma omp simd aligned(x:32) for (int i = 0; i < 16; ++i) x[i] = reverseBits(x[i]); // couldn't vectorize loop } clang and icc vectorize this: https://godbolt.org/z/ROJZGZ
[Bug tree-optimization/87915] New: emit warning if (explicit) vectorization failed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87915 Bug ID: 87915 Summary: emit warning if (explicit) vectorization failed Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com Target Milestone: --- When using #pragma omp simd for explicit vectorization shouldn't it warn if vectorization failed? clang has -Wpass-failed for that: http://lists.llvm.org/pipermail/cfe-dev/2015-July/044226.html
[Bug tree-optimization/87915] emit warning if (explicit) vectorization failed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87915 --- Comment #2 from krux --- Yeah I'm using -fopt-info for manual performance analysis but that can't be enabled in the normal build as it's too noisy. Furthermore a proper warning can be turned into an error to ensure that developer expectations are met by the compiler.
[Bug target/87913] max(n, 1) code generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87913 --- Comment #2 from krux --- The case of function g is quite interesting because of the data dependencies and adc's latency: https://godbolt.org/z/0V8Dlx
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #4 from krux --- +1 The builtins already produce better code than a generic bitreverse implementation: https://godbolt.org/z/Um2Tit But using special hardware instructions automatically is even more important imho.
[Bug middle-end/12849] testing divisibility by constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12849 krux changed: What|Removed |Added CC||hoganmeier at gmail dot com --- Comment #5 from krux --- Should be fixed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853
[Bug c++/87656] Useful flags to enable with -Wall or -Wextra
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87656 krux changed: What|Removed |Added CC||hoganmeier at gmail dot com --- Comment #3 from krux --- -Wshadow, at least the local variant, would indeed be really nice in -Wall or at least -Wextra. The global one is still too noisy because of class constructors: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78147 I always use -Wall -W -Wshadow -Wconversion -Wsign-conversion.
[Bug c++/45615] -Wshadow doesn't report class member shadowing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45615 krux changed: What|Removed |Added CC||hoganmeier at gmail dot com --- Comment #2 from krux --- Confirmed on trunk: https://godbolt.org/z/jL0ony
[Bug c++/87656] Useful flags to enable with -Wall or -Wextra
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87656 --- Comment #5 from krux --- I meant -Wshadow=local.
[Bug target/88013] New: can't vectorize rgb to grayscale conversion code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88013 Bug ID: 88013 Summary: can't vectorize rgb to grayscale conversion code Product: gcc Version: 7.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com Target Milestone: --- #include void reference_convert(uint8_t * __restrict dest, uint8_t * __restrict src, int n) { for (int i=0; ihttps://godbolt.org/z/FPG3k_
[Bug target/88013] can't vectorize rgb to grayscale conversion code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88013 --- Comment #1 from krux --- Something like -march=armv8-a -mfpu=neon-fp-armv8 does not work either. https://godbolt.org/z/MpBQ0I
[Bug target/88013] can't vectorize rgb to grayscale conversion code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88013 --- Comment #3 from krux --- A few NEON instructions are sufficient: https://web.archive.org/web/20170227190422/http://hilbert-space.de/?p=22 clang seems to generate similar code, see the godbolt links.
[Bug target/88013] can't vectorize rgb to grayscale conversion code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88013 --- Comment #4 from krux --- On x64 indeed both compilers generate a huge amount of code. https://godbolt.org/z/TH7mqn
[Bug target/88013] can't vectorize rgb to grayscale conversion code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88013 --- Comment #6 from krux --- -mfloat-abi=hard was missing indeed. It's a pity there's no warning like when trying to use the intrinsics. Still I see a lot more instructions, maybe that got fixed after v7.2? https://godbolt.org/z/OWzgXi vld3.8 {d16, d18, d20}, [r3] add ip, r3, #24 add lr, lr, #1 add r3, r3, #48 cmp lr, r5 vld3.8 {d17, d19, d21}, [ip] vmovl.u8 q5, d16 vmovl.u8 q15, d18 vmovl.u8 q11, d17 vmovl.u8 q4, d19 vmovl.u8 q0, d20 vmovl.u8 q1, d21 vmull.s16 q6, d10, d28 vmull.s16 q3, d22, d28 vmull.s16 q2, d30, d26 vmull.s16 q11, d23, d29 vmull.s16 q15, d31, d27 vmull.s16 q5, d11, d29 vmull.s16 q9, d8, d26 vmull.s16 q8, d9, d27 vadd.i32 q2, q6, q2 vadd.i32 q10, q5, q15 vadd.i32 q9, q3, q9 vmull.s16 q15, d0, d24 vadd.i32 q8, q11, q8 vmull.s16 q3, d2, d24 vmull.s16 q0, d1, d25 vmull.s16 q1, d3, d25 vadd.i32 q11, q2, q15 vadd.i32 q9, q9, q3 vadd.i32 q10, q10, q0 vadd.i32 q8, q8, q1 vshr.s32 q11, q11, #8 vshr.s32 q9, q9, #8 vshr.s32 q10, q10, #8 vshr.s32 q8, q8, #8 vmovn.i32 d30, q11 vmovn.i32 d31, q10 vmovn.i32 d20, q9 vmovn.i32 d21, q8 vmovn.i16 d16, q15 vmovn.i16 d17, q10 vst1.8 {q8}, [r4]
[Bug tree-optimization/88440] New: size optimization of memcpy-like code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88440 Bug ID: 88440 Summary: size optimization of memcpy-like code Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com Target Milestone: --- https://godbolt.org/z/RTji7B void foo(char* restrict dst, const char* buf) { for (int i=0; i<8; ++i) *dst++ = *buf++; } $ gcc -Os $ gcc -O2 .L2: mov dl, BYTE PTR [rsi+rax] mov BYTE PTR [rdi+rax], dl inc rax cmp rax, 8 jne .L2 $ gcc -O3 mov rax, QWORD PTR [rsi] mov QWORD PTR [rdi], rax $ arm-none-eabi-gcc -O3 -mthumb -mcpu=cortex-m4 ldr r3, [r1] @ unaligned ldr r2, [r1, #4] @ unaligned str r2, [r0, #4] @ unaligned str r3, [r0] @ unaligned The -O3 code is both faster and smaller for both ARM and x64: "note: Loop 1 distributed: split to 0 loops and 1 library calls." Should be considered for -O2 and -Os as well.
[Bug tree-optimization/88440] size optimization of memcpy-like code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88440 --- Comment #3 from krux --- Adding -ftree-loop-distribute-patterns to -Os does not seem to make a difference though.
[Bug c++/38658] trivial try/catch statement not eliminated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38658 krux changed: What|Removed |Added CC||hoganmeier at gmail dot com --- Comment #5 from krux --- https://godbolt.org/z/rnDy8l
[Bug debug/88534] New: internal compiler error: in tree_add_const_value_attribute, at dwarf2out.c:20246
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88534 Bug ID: 88534 Summary: internal compiler error: in tree_add_const_value_attribute, at dwarf2out.c:20246 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com Target Milestone: --- #include #include #include template class basic_fixed_string { CharT content[N]; public: using char_type = CharT; template constexpr basic_fixed_string(const CharT (&input)[N], std::index_sequence) noexcept: content{input[I]...} { } constexpr basic_fixed_string(const CharT (&input)[N]) noexcept: basic_fixed_string(input, std::make_index_sequence()) { } constexpr size_t size() const noexcept { // string literals are zero terminated if (content[N-1] == '\0') return N - 1; else return N; } constexpr CharT operator[](size_t i) const noexcept { return content[i]; } constexpr const CharT * begin() const noexcept { return content; } constexpr const CharT * end() const noexcept { return content + size(); } }; template basic_fixed_string(const CharT (&)[N]) -> basic_fixed_string; template struct F { }; auto foo() { F<"test"> f; } # g++ -O3 -std=c++2a -g -S :46:1: internal compiler error: in tree_add_const_value_attribute, at dwarf2out.c:20246
[Bug debug/88534] internal compiler error: in tree_add_const_value_attribute, at dwarf2out.c:20246
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88534 krux changed: What|Removed |Added Keywords||ice-on-valid-code --- Comment #1 from krux --- https://godbolt.org/z/G-9Zqh
[Bug debug/88534] internal compiler error: in tree_add_const_value_attribute, at dwarf2out.c:20246
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88534 --- Comment #2 from krux --- The code is based on https://github.com/hanickadot/compile-time-regular-expressions/blob/master/include/ctll/fixed_string.hpp
[Bug c/88566] New: -Wconversion not using value range information
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88566 Bug ID: 88566 Summary: -Wconversion not using value range information Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: hoganmeier at gmail dot com Target Milestone: --- https://godbolt.org/z/p0RMde unsigned char foo(uint8_t pin) { if (pin >= 3 && pin <= 6) return pin - 2; if (pin >= 9 && pin <= 10) return pin - 4; if (pin >= 20 && pin <= 23) return pin - 13; return 0; } $ gcc -O3 -Wconversion -S :5:39: warning: conversion from 'int' to 'uint8_t' {aka 'unsigned char'} may change value [-Wconversion] 5 | if (pin >= 3 && pin <= 6) return pin - 2; | ^~~ gcc should be aware that the value is well within the uint8_t range.
[Bug c/88566] -Wconversion not using value range information
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88566 --- Comment #1 from krux --- Even simpler example: uint8_t foo(uint8_t pin) { return pin > 0 ? pin - 1 : 0; }