[Bug target/81621] New: ICE in delete_insn, at cfgrtl.c:167 with s390x cross compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81621 Bug ID: 81621 Summary: ICE in delete_insn, at cfgrtl.c:167 with s390x cross compiler Product: gcc Version: 7.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org Target Milestone: --- Host: x86_64-linux-gnu Target: s390x-linux-gnu Running cross compiler ICEs: $ s390x-linux-gnu-gcc /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/graphite/scop-10.c -Og -fno-split-wide-types -freorder-blocks-and-partition 0xdeadbeef delete_insn(rtx_insn*) .././../gcc/cfgrtl.c:167 0xdeadbeef move_unallocated_pseudos .././../gcc/ira.c:5041 0xdeadbeef ira .././../gcc/ira.c:5399 0xdeadbeef execute .././../gcc/ira.c:5581
[Bug c/79586] missing -Wdeprecated depending on position of attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79586 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-31 CC||egallager at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Eric Gallager --- Confirmed.
[Bug tree-optimization/81620] [8 Regression] ICE in is_inv_store_elimination_chain, at tree-predcom.c:1651 with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81620 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-31 CC||amker at gcc dot gnu.org, ||marxin at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Martin Liška --- Confirmed, started with r250670.
[Bug lto/81612] lto1: internal compiler error: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81612 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2017-07-31 Ever confirmed|0 |1 --- Comment #1 from Martin Liška --- Can you please attach a pre-processed source code that triggers that?
[Bug boehm-gc/64042] FAIL: boehm-gc.c/gctest.c -O2 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64042 --- Comment #15 from Tom de Vries --- Subject: Re: [Gc] boehm-gc.c/gctest.c spurious failure From: bo...@acm.org To: tom_devr...@mentor.com CC: bd...@lists.opendylan.org Date: 01/21/2015 09:11 PM I haven't had a chance to look at this carefully. But the typed allocation test looks a bit fishy. It seems to allocate a 2000 byte object described by a 320 bit entry bit map, each of which describes a pointer-sized word, IIRC. That worked fine once upon a time when we this code was written and we only had 32-bit machines ... Hans
[Bug boehm-gc/64042] FAIL: boehm-gc.c/gctest.c -O2 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64042 --- Comment #14 from Tom de Vries --- Subject: boehm-gc.c/gctest.c spurious failure From: tom_devr...@mentor.com To: bd...@lists.opendylan.org Date: 01/19/2015 10:09 AM Hi, FYI, with gcc trunk on x86_64 Linux, I ran into PR64042: 'FAIL: boehm-gc.c/gctest.c -O2 execution test'. ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64042 ) The testcase gctest fails spuriously, a couple of times per 1000 runs. The failure has been reproduced by others, also on Darwin. Any information on this is appreciated. Thanks, - Tom
[Bug boehm-gc/64042] FAIL: boehm-gc.c/gctest.c -O2 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64042 --- Comment #13 from Tom de Vries --- (In reply to Eric Gallager from comment #12) > (In reply to Tom de Vries from comment #11) > > Reported upstream here: > > https://lists.opendylan.org/pipermail/bdwgc/2015-January/006071.html > > This link doesn't work for me; is there a better upstream bug link URL? The archive seems to be down, and also gmane doesn't seem to work. I'll post the thread here.
[Bug c/61342] Segfault when using default clause and VLA in OpenMP task
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61342 Eric Gallager changed: What|Removed |Added Keywords||openmp Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-31 CC||egallager at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Eric Gallager --- Confirmed, the stray quote mark printed preceding the ICE message looks suspicious, too.
gcc-bugs@gcc.gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30552 Eric Gallager changed: What|Removed |Added Last reconfirmed|2008-12-29 14:22:53 |2017-7-30 CC||egallager at gcc dot gnu.org Summary|gcc crashed when compiling |gcc crashes when compiling |an example |examples with GNU statement ||expressions in VLAs (also ||involved: nested functions ||declared K&R-style) Known to fail||8.0 --- Comment #3 from Eric Gallager --- Confirmed that gcc still ICEs, although I'm not sure if the code is valid or not... I'll leave the "ice-on-valid-code" keyword for now; someone else more knowledgeable than me can change it if necessary.
[Bug c/79320] sqrt of negative number do not return NaN with i686-w64-mingw32-gcc on pentiumI7/Windows10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79320 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED URL||https://sourceforge.net/p/m ||ingw-w64/bugs/567/ CC||egallager at gcc dot gnu.org Resolution|--- |MOVED --- Comment #3 from Eric Gallager --- (In reply to Daniel WEIL from comment #2) > OK. I log the issue on mingw bugs : > https://sourceforge.net/p/mingw/bugs/2337/ Linked bug shows that that one was closed in favor of: https://sourceforge.net/p/mingw-w64/bugs/567/ So closing to mark as MOVED to the mingw-w64 one.
[Bug c/70257] #line incorrectly handled in error messages
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70257 Manuel López-Ibáñez changed: What|Removed |Added CC||manu at gcc dot gnu.org --- Comment #2 from Manuel López-Ibáñez --- I think this is a dup of bug 79106. The caret line is printed by reopening the file and counting 3 lines because the line directive is believed by GCC to point to the actual source code.
[Bug c/79010] -Wlarger-than ineffective for VLAs, alloca, malloc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79010 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-30 CC||egallager at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Eric Gallager --- Confirmed.
[Bug sanitizer/81340] ICE in compute_bb_dataflow, at var-tracking.c:6877
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81340 --- Comment #5 from Daniel Black --- Thankyou Martin.
[Bug c/78155] missing warning on invalid isalpha et al.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78155 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-30 CC||egallager at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Eric Gallager --- When I run the program, it prints 0 rather than crashing. Confirming that a warning would be nice though, for portability to platforms where it would cause a crash.
[Bug c/63710] Incorrect column number for -Wconversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63710 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-30 CC||egallager at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Eric Gallager --- Confirmed. The location for the first one is still the same, but the location for the second one has changed: $ /usr/local/bin/gcc -c -Wconversion 63710.c 63710.c: In function ‘f1’: 63710.c:2:24: warning: conversion to ‘long unsigned int’ from ‘char’ may change the sign of the result [-Wsign-conversion] unsigned long r1 = ul + l; ^ 63710.c:3:23: warning: conversion to ‘long unsigned int’ from ‘char’ may change the sign of the result [-Wsign-conversion] unsigned long r2 = l + ul; ^ 63710.c: In function ‘f2’: 63710.c:8:15: warning: conversion to ‘unsigned int’ from ‘char’ may change the sign of the result [-Wsign-conversion] return l ? l : c; ~~^~~ 63710.c:8:15: warning: conversion to ‘unsigned int’ from ‘long int’ may change the sign of the result [-Wsign-conversion] $ I agree that both could still be better.
[Bug target/81602] Unnecessary zero-extension after 16 bit popcnt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81602 --- Comment #1 from Uroš Bizjak --- (In reply to Christoph Diegelmann from comment #0) > GCC misses an optimization on this: > > #include > #include "immintrin.h" > > void test(std::uint16_t* mask, std::uint16_t* data) { > for (int i = 0; i < 1024; ++i) { > *data = 0; > unsigned tmp = *mask++; > unsigned step = _mm_popcnt_u32(tmp); > data += step; > } > } > > g++ -O3 -Wall -std=c++14 -march=skylake generates: > > test(unsigned short*, unsigned short*): > leaq 2048(%rdi), %rdx > .L2: > xorl %eax, %eax > addq $2, %rdi > movw %ax, (%rsi) > popcntw -2(%rdi), %ax > movzwl %ax, %eax > leaq (%rsi,%rax,2), %rsi > cmpq %rdx, %rdi > jne .L2 > ret > > The rax register is known to be zero at the time of `popcntw -2(%rdi), %ax`. > Anyway gcc still clears the upper bits using `movzwl %ax, %eax` afterwards. The "xorl %eax, %eax; movw %ax, (%rsi)" pair is just optimized way to implement "movw $0, (%rsi);". It just happens that peephole pass found unused %eax as an empty temporary reg when splitting direct move of immediate to memory. > While clang uses 32 bit popcnt and `movzwl (%rdi,%rax,2), %ecx` it correctly > recognises that there's no need to clear the upper bits. > > clang -O3 -Wall -std=c++14 -march=skylake -fno-unroll-loops generates: > > test(unsigned short*, unsigned short*): > xorl %eax, %eax > .LBB0_1: > movw $0, (%rsi) > movzwl (%rdi,%rax,2), %ecx > popcntl %ecx, %ecx > leaq (%rsi,%rcx,2), %rsi > addq $1, %rax > cmpl $1024, %eax # imm = 0x400 > jne .LBB0_1 > retq popcntl has a false dependency on its output in certain situations, where popcntw doesn have this limitation. So, gcc choose this approach for a reason.
[Bug target/25967] Add attribute naked for x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25967 Uroš Bizjak changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2017-07-30 Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Target Milestone|--- |8.0 Ever confirmed|0 |1 --- Comment #16 from Uroš Bizjak --- Patch at [1]. [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01968.html
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #7 from PeteVine --- Thanks for pointing that out! I was using my bash history to change the CFLAGS and when I was flipping the crc switch I didn't notice I'd picked a version without -frename-registers, hence this wrong conclusion :) Definitely then, -frename-registers it is! http://openbenchmarking.org/result/1707307-RI-CORTEXA5313
[Bug web/43887] stable anchors needed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43887 Manuel López-Ibáñez changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #1 from Manuel López-Ibáñez --- This seems to have been fixed recently and anchor names for options are not numbered anymore.
[Bug tree-optimization/81620] [8 Regression] ICE in is_inv_store_elimination_chain, at tree-predcom.c:1651 with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81620 Andrew Pinski changed: What|Removed |Added Keywords||ice-on-valid-code Target||x86_64-pc-linux-gnu Component|c |tree-optimization Version|unknown |8.0 Target Milestone|--- |8.0 Summary|ICE on valid code at -O3 in |[8 Regression] ICE in |both 32-bit and 64-bit |is_inv_store_elimination_ch |modes on x86_64-linux-gnu |ain, at tree-predcom.c:1651 |(internal compiler error: |with -O3 |in | |is_inv_store_elimination_ch | |ain, at | |tree-predcom.c:1651)|
[Bug sanitizer/81619] pairs of mmap/munmap do not reset asan's user-poisoning flags, leading to invalid error reports
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81619 Daniel Villeneuve changed: What|Removed |Added Attachment #41863|0 |1 is obsolete|| --- Comment #4 from Daniel Villeneuve --- Created attachment 41866 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41866&action=edit small C program showing the problem on Linux
[Bug c/81620] New: ICE on valid code at -O3 in both 32-bit and 64-bit modes on x86_64-linux-gnu (small.c:3:5: internal compiler error: in is_inv_store_elimination_chain, at tree-predcom.c:1651)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81620 Bug ID: 81620 Summary: ICE on valid code at -O3 in both 32-bit and 64-bit modes on x86_64-linux-gnu (small.c:3:5: internal compiler error: in is_inv_store_elimination_chain, at tree-predcom.c:1651) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: chengniansun at gmail dot com Target Milestone: --- $ gcc-trunk -v Using built-in specs. COLLECT_GCC=gcc-trunk COLLECT_LTO_WRAPPER=/usr/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/8.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-source-trunk/configure --enable-languages=c,c++,lto --prefix=/usr/local/gcc-trunk --disable-bootstrap Thread model: posix gcc version 8.0.0 20170730 (experimental) [trunk revision 250721] (GCC) $ gcc-trunk -O3 small.c during GIMPLE pass: pcom small.c: In function ‘main’: small.c:3:5: internal compiler error: in is_inv_store_elimination_chain, at tree-predcom.c:1651 int main() { ^~~~ 0xd25e10 is_inv_store_elimination_chain ../../gcc-source-trunk/gcc/tree-predcom.c:1651 0xd25e10 prepare_initializers_chain_store_elim ../../gcc-source-trunk/gcc/tree-predcom.c:2786 0xd25e10 prepare_initializers_chain ../../gcc-source-trunk/gcc/tree-predcom.c:2846 0xd25e10 prepare_initializers ../../gcc-source-trunk/gcc/tree-predcom.c:2901 0xd25e10 tree_predictive_commoning_loop ../../gcc-source-trunk/gcc/tree-predcom.c:3092 0xd25e10 tree_predictive_commoning() ../../gcc-source-trunk/gcc/tree-predcom.c:3170 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. $ cat small.c int a[7]; char b; int main() { b = 4; for (; b; b--) { a[b] = b; a[b + 2] = 1; } return 0; } $
[Bug sanitizer/81619] pairs of mmap/munmap do not reset asan's user-poisoning flags, leading to invalid error reports
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81619 --- Comment #3 from Andrew Pinski --- This might be a bug in the upstream sources too.
[Bug sanitizer/81619] pairs of mmap/munmap do not reset asan's user-poisoning flags, leading to invalid error reports
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81619 --- Comment #2 from Daniel Villeneuve --- Created attachment 41865 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41865&action=edit shell script to invoke program in different configurations
[Bug sanitizer/81619] pairs of mmap/munmap do not reset asan's user-poisoning flags, leading to invalid error reports
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81619 --- Comment #1 from Daniel Villeneuve --- Created attachment 41864 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41864&action=edit Makefile to build program
[Bug sanitizer/81619] New: pairs of mmap/munmap do not reset asan's user-poisoning flags, leading to invalid error reports
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81619 Bug ID: 81619 Summary: pairs of mmap/munmap do not reset asan's user-poisoning flags, leading to invalid error reports Product: gcc Version: 6.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: dvilleneuve at kronos dot com CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org Target Milestone: --- Created attachment 41863 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41863&action=edit small C program showing the problem on Linux When using mmap/munmap from an application, memory returned by mmap is not seen by the address sanitizer in a newly-initialized state: it might still be marked with user-poisoning flags. This is unlike using malloc/free pairs, where memory obtained from malloc, although possibly reused after being freed, is correctly initialized. By looking at the code for the sanitizer (gcc 6.3.0), I could figure out that malloc/free do some reinitialization of memory flags. I could not find such code for mmap/munmap. A workaround in the application is to explicitly call ASAN_UNPOISON_MEMORY_REGION prior to invoking munmap.
[Bug c/61939] warn when __attribute__((aligned(x))) is ignored
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61939 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-30 CC||egallager at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #3 from Eric Gallager --- I had to modify the original testcase a bit to get it to compile: $ cat 61939.c struct some_struct { int foo; }; void copy_something(void *p, const void *s) { struct some_struct __attribute__((aligned(8))) *_d = p; struct some_struct __attribute__((aligned(8))) *_s = s; *_d = *_s; } $ /usr/local/bin/gcc -c -Wall -Wextra -pedantic -Wcast-align -Wattributes 61939.c 61939.c: In function ‘copy_something’: 61939.c:4:58: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers] struct some_struct __attribute__((aligned(8))) *_s = s; ^ $ But beyond that, yeah, confirmed. I think there's probably a duplicate around here somewhere but I've forgotten the number already...
[Bug target/81614] Should -mtune-ctrl=partial_reg_stall be turned by default?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81614 --- Comment #5 from Uroš Bizjak --- (In reply to Cody Gray from comment #3) > > Also, it is hard to confirm tuning PRs without hard benchmark data. > > No, it really isn't. I know that's a canned response, likely brought about > by hard-won experience with a lot of dubious "tuning" feature requests, but > it's just a cop-out in this case, if not outright dismissive. Partial > register stalls are a well-documented phenomenon, confirmed by multiple > sources, and have been a significant source of performance degradation since > the Pentium Pro was released circa 1995. Well, then please find some representative benchmark suite and test the effect of -mtune-ctrl=partial_reg_stall on your target. There are plenty of benchmarks listed at [1]. It is an one-line change in the compiler source to set the new default then. [1] https://gcc.opensuse.org/specs/cxx_groups
[Bug target/81614] Should -mtune-ctrl=partial_reg_stall be turned by default?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81614 H.J. Lu changed: What|Removed |Added Status|RESOLVED|REOPENED Last reconfirmed||2017-07-30 Resolution|DUPLICATE |--- Ever confirmed|0 |1 --- Comment #4 from H.J. Lu --- -mtune-ctrl=partial_reg_stall is turned on only for -mtune=i686. We should exam it for Nehalem and above processors.
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #6 from Andrew Pinski --- (In reply to PeteVine from comment #5) > Turns out the GCC 8 regression is caused by the +crc switch in > -march=armv8-a+crc. Interesting, eh? +crc should not cause any code generation difference ...
[Bug target/25967] Add attribute naked for x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25967 --- Comment #15 from Uroš Bizjak --- Please also note this description from the gcc docs: 'naked' This attribute allows the compiler to construct the requisite function declaration, while allowing the body of the function to be assembly code. The specified function will not have prologue/epilogue sequences generated by the compiler. Only basic 'asm' statements can safely be included in naked functions (*note Basic Asm::). While using extended 'asm' or a mixture of basic 'asm' and C code may appear to work, they cannot be depended upon to work reliably and are not supported.
[Bug sanitizer/81601] [7/8 Regression] incorrect Warray-bounds warning with -fsanitize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601 --- Comment #6 from Patrick Palka --- (In reply to Patrick Palka from comment #5) > So what's the right way to fix this? To move optimize_bit_field_compare() > from fold_binary to match.pd so that the conditions on ... so that conditions on tp->chrono_type get consistently transformed into BIT_FIELD_REFs, or to remove optimize_bit_field_compare() altogether? It seems like a rather low-level optimization to be done in GENERIC/GIMPLE.
[Bug sanitizer/81601] [7/8 Regression] incorrect Warray-bounds warning with -fsanitize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601 --- Comment #5 from Patrick Palka --- So what's the right way to fix this? To move optimize_bit_field_compare() from fold_binary to match.pd so that the conditions on
[Bug tree-optimization/81354] [5/6 Regression] Segmentation fault in SSA Strength Reduction using -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81354 --- Comment #9 from Bill Schmidt --- OK, I've now confirmed this is the problem. I have a rough patch for trunk, and backporting it to GCC 5 r236439 verifies that this fixes it. Still verifying bootstrap/regression on trunk, and need to do some cleanup before submitting.
[Bug c++/81587] GCC doesn't warn about calling functions that don't exist
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81587 --- Comment #5 from Jonny Grant --- Thank you Martin, I raised Bug #81618
[Bug target/25967] Add attribute naked for x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25967 --- Comment #14 from Uroš Bizjak --- I'm testing the above patch. Using the patched compiler, the testcase that is mentioned by Daniel in Comment #12 can be changed to: Index: testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c === --- testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c (revision 250720) +++ testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c (working copy) @@ -169,15 +169,9 @@ #define TEST_DATA_OFFSET(f)((int)__builtin_offsetof(struct test_data, f)) -void __attribute__((used)) -do_test_body0 (void) -{ - __asm__ ("\n" - " .globl " ASMNAME(do_test_body) "\n" -#ifdef __ELF__ - " .type " ASMNAME(do_test_body) ",@function\n" -#endif - ASMNAME(do_test_body) ":\n" +void __attribute__((naked)) +do_test_body (void) +{__asm__ ( " # rax, r10 and r11 are usable here.\n" "\n" " # Save registers.\n" @@ -212,9 +206,6 @@ " call" ASMNAME(mem_to_regs) "\n" "\n" " retq\n" -#ifdef __ELF__ - " .size " ASMNAME(do_test_body) ",.-" ASMNAME(do_test_body) "\n" -#endif :: "i"(TEST_DATA_OFFSET(regdata[REG_SET_SAVE])), "i"(TEST_DATA_OFFSET(regdata[REG_SET_INPUT])),
[Bug c++/81618] New: Warn for unused functions declared in local scope
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81618 Bug ID: 81618 Summary: Warn for unused functions declared in local scope Product: gcc Version: 5.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: jg at jguk dot org Target Milestone: --- Hello Could GCC warn for unused functions declared in local scope please? See function below g() $ cat t.C && gcc -S -Wall t.C void f (void) { typedef int I; int i; void g (); } t.C: In function ‘void f()’: t.C:4:7: warning: unused variable ‘i’ [-Wunused-variable] int i; ^ t.C:3:15: warning: typedef ‘I’ locally defined but not used [-Wunused-local-typedefs] typedef int I; ^
[Bug target/25967] Add attribute naked for x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25967 --- Comment #13 from Uroš Bizjak --- Created attachment 41862 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41862&action=edit Patch that implements naked attribute
[Bug target/81614] Should -mtune-ctrl=partial_reg_stall be turned by default?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81614 --- Comment #3 from Cody Gray --- (In reply to Uroš Bizjak from comment #1) > Partial register stalls were discussed many times in the past, but > apparently the compiler still produces fastest code when partial register > stalls are enabled on latest target processors (e.g. -mtune=intel). I don't understand what that means. -mtune=intel does *not* fix the partial register stall problem. It should. All Intel CPUs prior to Haswell would absolutely experience partial register stalls on this code, resulting in a performance degradation. -mtune-ctrl=partial_reg_stall does get the correct code, but I wasn't aware of this option and I believe I shouldn't have to be. If a developer is getting sub-optimal code even when he is asking the compiler to tune for his specific microarchitecture, then the optimizer has a bug. This is not an issue where there are arguments on either side. There is absolutely no benefit to generating the code that the compiler currently does. It is the same number of bytes to OR the BYTE-sized registers as it is to OR the DWORD-sized registers, while the former will run faster on the vast majority of CPUs and won't be any slower on the others. > Also, it is hard to confirm tuning PRs without hard benchmark data. No, it really isn't. I know that's a canned response, likely brought about by hard-won experience with a lot of dubious "tuning" feature requests, but it's just a cop-out in this case, if not outright dismissive. Partial register stalls are a well-documented phenomenon, confirmed by multiple sources, and have been a significant source of performance degradation since the Pentium Pro was released circa 1995. Agner Fog's manuals, as cited above, are really the authoritative reference when it comes to performance tuning on x86, and they provide confirmation of this in spades. In fact, I would argue that an accurate conceptual understanding of the microarchitecture is often a better guide than one-off microbenchmarks, since the latter are so difficult to craft and therefore so often misleading. For example, the effects of the stall might be masked by the overhead of the function call, but when the code is inlined or *certainly* when it is executed within an inner loop, there will be a significant performance degradation. Again, if this were an issue where I was proposing bloating the size of the code for a small payoff in speed, I could see how you might be skeptical. But there is literally no downside to making this change. You could possibly argue that -mtune-ctrl=partial_reg_stall should not be turned on when tuning for Haswell and later microarchitectures, as Haswell was the first to alleviate the visible performance penalties associated with reading from a full 32-bit register after writing to a partial 8-bit "view" of that same register. However, this applies *only* to the low-byte register (e.g., AL, CL, DL, etc.). With the high-byte registers (e.g., AH, CH, DH, etc.), there is still a loss in performance because an extra µop has to be inserted between the write to the 8-bit register and the read from the 32-bit register. This increases the latency by one clock cycle, and so unless the xH partial registers are treated differently from the xL partial registers, applying the optimizations described would still result in a performance win, especially since there is no drawback.
[Bug go/81617] New: mksigtab.sh fails to resolve NSIG with glibc 2.26
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81617 Bug ID: 81617 Summary: mksigtab.sh fails to resolve NSIG with glibc 2.26 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: go Assignee: ian at airs dot com Reporter: sch...@linux-m68k.org CC: cmang at google dot com Target Milestone: --- In glibc 2.26 the value of _NSIG is now defined as an expression of __SIGRTMAX instead of a simple number. $ grep 'NSIG =' gen-sysinfo.go const _NSIG = __NSIG const __NSIG = (___SIGRTMAX + 1)
[Bug tree-optimization/81354] [5/6 Regression] Segmentation fault in SSA Strength Reduction using -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81354 --- Comment #8 from Bill Schmidt --- This is likely the same as another problem that recently came up (not yet filed as the source is sensitive). SLSR is sensitive to addresses of PHI instructions remaining the same throughout the pass, but gimple_split_edge does not maintain this. I'm working on a patch to ensure that it does. I still need to verify this is the same issue.
[Bug c/64619] No -Wsign-conversion warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64619 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-30 CC||egallager at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Eric Gallager --- (In reply to Mikhail Maltsev from comment #1) > Indeed, confirmed on recent revision, r219801. Changing status to NEW then.
[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 H.J. Lu changed: What|Removed |Added CC||cody at codygray dot com --- Comment #1 from H.J. Lu --- *** Bug 81614 has been marked as a duplicate of this bug. ***
[Bug target/81614] Should -mtune-ctrl=partial_reg_stall be turned by default?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81614 H.J. Lu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE Summary|x86 optimizer combines |Should |results of comparisons in a |-mtune-ctrl=partial_reg_sta |way that risks partial |ll be turned by default? |register stalls | --- Comment #2 from H.J. Lu --- With -mtune-ctrl=partial_reg_stall, I got [hjl@gnu-tools-1 pr81614]$ cat x.i _Bool foo(int a, int b, int c) { return (a == c || b == c); } int bar (int a, int b, int c) { return (a == c || b == c); } [hjl@gnu-tools-1 pr81614]$ make /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2 -mtune-ctrl=partial_reg_stall -S x.i [hjl@gnu-tools-1 pr81614]$ cat x.s .file "x.i" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc cmpl%edx, %edi sete%al cmpl%esi, %edx sete%dl orb %dl, %al ret .cfi_endproc .LFE0: .size foo, .-foo .p2align 4,,15 .globl bar .type bar, @function bar: .LFB1: .cfi_startproc cmpl%edx, %edi sete%al cmpl%esi, %edx sete%dl orb %dl, %al movzbl %al, %eax ret .cfi_endproc .LFE1: .size bar, .-bar .ident "GCC: (GNU) 8.0.0 20170730 (experimental)" .section.note.GNU-stack,"",@progbits [hjl@gnu-tools-1 pr81614]$ I opened PR 81616 to update default tuning options. *** This bug has been marked as a duplicate of bug 81616 ***
[Bug c/69389] bit field incompatible with OpenMP atomic update
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69389 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-30 CC||egallager at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Eric Gallager --- Confirmed.
[Bug target/81616] New: Update -mtune=generic for the current Intel and AMD processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 Bug ID: 81616 Summary: Update -mtune=generic for the current Intel and AMD processors Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: pavel.v.chupin at gmail dot com Blocks: 80820 Target Milestone: --- Target: x86 -mtune=generic should be updated for the current Intel and AMD processors. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80820 [Bug 80820] _mm_set_epi64x shouldn't store/reload for -mtune=haswell, Zen should avoid store/reload, and generic should think about it.
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #5 from PeteVine --- Turns out the GCC 8 regression is caused by the +crc switch in -march=armv8-a+crc. Interesting, eh?
[Bug fortran/81615] New: save-temps and gfortran produces *.f90 files instead of *.i or *i90 files
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81615 Bug ID: 81615 Summary: save-temps and gfortran produces *.f90 files instead of *.i or *i90 files Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: barrowes at alum dot mit.edu Target Milestone: --- If I have a file test1.f: program test2 real x x=3.0 C#ifdef DDD C x=2.0 C#endif print *,'x=',x end and I compile it with: gfortran -save-temps -o test0 test0.f I get two temporary files, test0.o and test0.s. If I uncomment the directive: program test2 real x x=3.0 #ifdef MPI x=2.0 #endif print *,'x=',x end and compile with: gfortran -cpp -save-temps -o test0 test0.f In addition to the two temporary files above, a test0.f90 is produced that looks like: # 1 "test0.f" # 1 "" # 1 "" # 1 "test0.f" program test2 real x x=3.0 print *,'x=',x end I was under the impression that I would get a test0.i file since the only documentation of using -save-temps I can find comes from the gcc docs: https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-save-temps which mentions *.i files in their example. And if I have a file test1.f90: program test1 real x x=3.0 #ifdef MPI x=2.0 #endif print *,'x=',x end and I compile with: gfortran -cpp -save-temps -o test1 test1.f90 The test1.o and test1.s files are produced, but no preprocessed fortran source file is produced (I suppose because the source file already has the f90 extension). How can I get a preprocessed source file in this case? Where is this behavior of -save-temps producing *.f90 files documented? Can I change the f90 extension of the preprocessed temporary files to i or i90 instead of f90?
[Bug target/81614] x86 optimizer combines results of comparisons in a way that risks partial register stalls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81614 Uroš Bizjak changed: What|Removed |Added CC||hjl.tools at gmail dot com --- Comment #1 from Uroš Bizjak --- This transformation is handled by -mtune-ctrl=partial_reg_stall tune flag (and more specifically, -mtune-ctrl=^promote_qimode flag). Partial register stalls were discussed many times in the past, but apparently the compiler still produces fastest code when partial register stalls are enabled on latest target processors (e.g. -mtune=intel). BTW, there are quite some flags in x86-tune.def under: /*/ /* Historical relics: tuning flags that helps a specific old CPU designs */ /*/ where nobody bothered to change defaults for new processors. Also, it is hard to confirm tuning PRs without hard benchmark data. Adding CC.
[Bug c/77328] incorrect caret location in -Wformat calling printf via a macro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77328 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-30 CC||egallager at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Eric Gallager --- Confirmed.
[Bug target/79793] Incorrect stack alignment for interrupt handler in 64-bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79793 H.J. Lu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #20 from H.J. Lu --- Fixed for GCC 8.
[Bug target/79793] Incorrect stack alignment for interrupt handler in 64-bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79793 --- Comment #19 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Jul 30 14:10:32 2017 New Revision: 250721 URL: https://gcc.gnu.org/viewcvs?rev=250721&root=gcc&view=rev Log: i386: Update INCOMING_FRAME_SP_OFFSET for exception handler Since there is an extra error code passed to the exception handler, INCOMING_FRAME_SP_OFFSET is return address plus error code for the exception handler. This patch updates INCOMING_FRAME_SP_OFFSET to the correct value for the exception handler. This patch exposed a bug in DWARF stack frame CFI generation, which assumes that INCOMING_FRAME_SP_OFFSET is the same for all functions: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81570 It sets and caches the incoming stack frame offset with the same INCOMING_FRAME_SP_OFFSET for all functions. When there are both exception handler and normal function in the same input, the wrong incoming stack frame offset is used for exception handler or normal function, which leads to FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 error == 0x12345670 FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 frame->ip == 0x12345671 FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 frame->cs == 0x12345672 FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 frame->flags == 0x12345673 FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 frame->sp == 0x12345674 FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 frame->ss == 0x12345675 With the patch for PR 81570: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01851.html applied, there are no regressions on i686 and x86-64. gcc/ PR target/79793 * config/i386/i386.c (ix86_function_arg): Update arguments for exception handler. (ix86_compute_frame_layout): Set the initial stack offset to INCOMING_FRAME_SP_OFFSET. Update red-zone offset with INCOMING_FRAME_SP_OFFSET. (ix86_expand_epilogue): Don't pop the 'ERROR_CODE' off the stack before exception handler returns. * config/i386/i386.h (INCOMING_FRAME_SP_OFFSET): Add the the 'ERROR_CODE' for exception handler. gcc/testsuite/ PR target/79793 * gcc.dg/guality/pr68037-1.c: Update gdb breakpoints. * gcc.target/i386/interrupt-5.c (interrupt_frame): New struct. (foo): Check the builtin return address against the return address in interrupt frame. * gcc.target/i386/pr79793-1.c: New test. * gcc.target/i386/pr79793-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr79793-1.c trunk/gcc/testsuite/gcc.target/i386/pr79793-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/guality/pr68037-1.c trunk/gcc/testsuite/gcc.target/i386/interrupt-5.c
[Bug debug/81570] create_pseudo_cfg assumes that INCOMING_FRAME_SP_OFFSET is a constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81570 --- Comment #3 from hjl at gcc dot gnu.org --- Author: hjl Date: Sun Jul 30 14:10:32 2017 New Revision: 250721 URL: https://gcc.gnu.org/viewcvs?rev=250721&root=gcc&view=rev Log: i386: Update INCOMING_FRAME_SP_OFFSET for exception handler Since there is an extra error code passed to the exception handler, INCOMING_FRAME_SP_OFFSET is return address plus error code for the exception handler. This patch updates INCOMING_FRAME_SP_OFFSET to the correct value for the exception handler. This patch exposed a bug in DWARF stack frame CFI generation, which assumes that INCOMING_FRAME_SP_OFFSET is the same for all functions: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81570 It sets and caches the incoming stack frame offset with the same INCOMING_FRAME_SP_OFFSET for all functions. When there are both exception handler and normal function in the same input, the wrong incoming stack frame offset is used for exception handler or normal function, which leads to FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 error == 0x12345670 FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 frame->ip == 0x12345671 FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 frame->cs == 0x12345672 FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 frame->flags == 0x12345673 FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 frame->sp == 0x12345674 FAIL: gcc.dg/guality/pr68037-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 33 frame->ss == 0x12345675 With the patch for PR 81570: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01851.html applied, there are no regressions on i686 and x86-64. gcc/ PR target/79793 * config/i386/i386.c (ix86_function_arg): Update arguments for exception handler. (ix86_compute_frame_layout): Set the initial stack offset to INCOMING_FRAME_SP_OFFSET. Update red-zone offset with INCOMING_FRAME_SP_OFFSET. (ix86_expand_epilogue): Don't pop the 'ERROR_CODE' off the stack before exception handler returns. * config/i386/i386.h (INCOMING_FRAME_SP_OFFSET): Add the the 'ERROR_CODE' for exception handler. gcc/testsuite/ PR target/79793 * gcc.dg/guality/pr68037-1.c: Update gdb breakpoints. * gcc.target/i386/interrupt-5.c (interrupt_frame): New struct. (foo): Check the builtin return address against the return address in interrupt frame. * gcc.target/i386/pr79793-1.c: New test. * gcc.target/i386/pr79793-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr79793-1.c trunk/gcc/testsuite/gcc.target/i386/pr79793-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/guality/pr68037-1.c trunk/gcc/testsuite/gcc.target/i386/interrupt-5.c
[Bug c/70502] inconsistent behavior of -Werror=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70502 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||egallager at gcc dot gnu.org Resolution|--- |DUPLICATE --- Comment #1 from Eric Gallager --- Looks like a dup of bug 55976 to me *** This bug has been marked as a duplicate of bug 55976 ***
[Bug c/55976] -Werror=return-type should error on returning a value from a void function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55976 Eric Gallager changed: What|Removed |Added CC||manu at gcc dot gnu.org --- Comment #3 from Eric Gallager --- *** Bug 70502 has been marked as a duplicate of this bug. ***
[Bug c/71996] -fdump-translation-unit fails to dump string literals of type char16_t/char32_t/wchar_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71996 Eric Gallager changed: What|Removed |Added CC||egallager at gcc dot gnu.org --- Comment #1 from Eric Gallager --- My version of gcc trunk doesn't recognize the flag ‘-fdump-translation-unit=stdout’; I think I remember reading on the mailing lists that it was going to be removed for gcc8...
[Bug libfortran/78449] compile time ieee_support_halting is not correct on arm and aarch64 ( FAIL: gfortran.dg/ieee/ieee_8.f90 -Os execution test )
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78449 Richard Earnshaw changed: What|Removed |Added Target Milestone|--- |7.0
[Bug middle-end/80929] [6/7/8 Regression] Division with constant no more optimized to mult highpart
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80929 Georg-Johann Lay changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-30 Summary|[7/8 Regression] Division |[6/7/8 Regression] Division |with constant no more |with constant no more |optimized to mult highpart |optimized to mult highpart Ever confirmed|0 |1 --- Comment #7 from Georg-Johann Lay --- v4.7 generates best code: The 2 div+mod 60 are implemented as 2 mul-highpart. v6 tries to be overly smart by fusing the two divisions by 60 to one division by 3600, leaving with 1 slow divmod call *and* 2 mul-higpart for the 2 modulo 60. v8 also fuses to a slow division by 3600, but also fails to use mul-highpart for the 2nd mod 60.
[Bug rtl-optimization/81611] gcc un-learned loop / post-increment optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81611 Georg-Johann Lay changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-30 Ever confirmed|0 |1
[Bug c/71870] wrong location of "%n$" directive in -Wformat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71870 Eric Gallager changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-07-30 CC||egallager at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Eric Gallager --- Confirmed. Note that the original testcase now prints an additional -Wformat-overflow warning: $ /usr/local/bin/gcc -c -Wall -Wextra -Wpedantic -S 71870.c 71870.c: In function ‘f’: 71870.c:5:26: warning: unknown conversion type character ‘r’ in format [-Wformat=] __builtin_sprintf (d, "%r"); ^ 71870.c:7:2: warning: ISO C does not support %n$ operand number formats [-Wformat=] __builtin_sprintf (d, "%2$i%1$i", 1, 234); ^ 71870.c:7:33: warning: ‘__builtin_sprintf’ writing a terminating nul past the end of the destination [-Wformat-overflow=] __builtin_sprintf (d, "%2$i%1$i", 1, 234); ^ 71870.c:7:2: note: ‘__builtin_sprintf’ output 5 bytes into a destination of size 4 __builtin_sprintf (d, "%2$i%1$i", 1, 234); ^ $
[Bug middle-end/80929] [7/8 Regression] Division with constant no more optimized to mult highpart
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80929 --- Comment #6 from Georg-Johann Lay --- Created attachment 41861 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41861&action=edit time-i.c: C test case (In reply to Richard Biener from comment #4) > Fixed? No. The attached test case $ avr-gcc-8 time-i.c -mmcu=atmega168 -O2 -S -dp Still uses slow __[u]divmodhi when optimizing for speed. The code has 2 divisions and modulo with 60. The first mod is expanded as mul highpart (insn 25) but the second is expanded as __divmodhi4 call (insn 67): timeid_add: ... ldi r26,lo8(-119); 24 *movhi/5[length = 2] ldi r27,lo8(-120) call __umulhisi3 ; 25 *umulhi3_highpart_call [length = 2] ... ldi r22,lo8(16) ; 61 *movhi/5[length = 2] ldi r23,lo8(14) call __udivmodhi4; 62 *udivmodhi4_call[length = 2] std Z+2,r22 ; 34 movqi_insn/3[length = 1] movw r24,r18 ; 65 *movhi/1[length = 1] ldi r22,lo8(60) ; 66 *movhi/5[length = 2] ldi r23,0 call __divmodhi4 ; 67 *divmodhi4_call [length = 2] std Z+1,r24 ; 50 movqi_insn/3[length = 1] /* epilogue start */
[Bug target/81614] New: x86 optimizer combines results of comparisons in a way that risks partial register stalls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81614 Bug ID: 81614 Summary: x86 optimizer combines results of comparisons in a way that risks partial register stalls Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: cody at codygray dot com Target Milestone: --- Target: i?86-*-* Consider the following code: bool foo(int a, int b, int c) { // It doesn't matter if this short-circuits ('||' vs. '|') // because the optimizer treats them as equivalent. return (a == c || b == c); } All versions of GCC (going back to at least 4.4.7 and forward to the current 8.0 preview) translate this to the following optimized assembly on x86 targets: foo(int, int, int): movl12(%esp), %edx cmpl%edx, 4(%esp) sete%al cmpl8(%esp), %edx sete%dl orl %edx, %eax ret The problem here is the second-to-last instruction. It ORs together two full 32-bit registers, even though the preceding SETE instructions only set the low 8 bits of each register. This results in a speed-zapping phenomenon on virtually all x86 processors called a *partial register stall*. (See http://www.agner.org/optimize/microarchitecture.pdf for details on exactly how this is a performance problem on various implementations of x86. Although there are differences in exactly *why* it is a speed penalty, it virtually always is and *certainly* should be considered one when the output is tuned for a generic x86 target.) You get the same results at all optimization levels, including -Os (at least, the relevant portion of the code is the same). You also see this for x86-64 targets: foo(int, int, int): cmpl%edx, %edi sete%al cmpl%esi, %edx sete%dl orl %edx, %eax ret One of two things should be done instead: either (A) perform the bitwise operation *only* on the low bytes, or (B) pre-zero the entire 32-bit register *before* setting its low byte to break dependencies. Proposed Resolution A (use only low bytes): foo(int, int, int): movl12(%esp), %edx cmpl%edx, 4(%esp) sete%al cmpl8(%esp), %edx sete%dl orl %dl, %al ret Proposed Resolution B (pre-zero to break dependencies): foo(int, int, int): movl12(%esp), %edx xorl%eax, %eax cmpl%edx, 4(%esp) sete%al xorl%ecx, %ecx cmpl8(%esp), %edx sete%cl orl %ecx, %eax ret Approach A is the one used by Clang and MSVC. It solves the problem of partial register stalls while avoiding the need for a third register as in Approach B. The disadvantage of Approach A is that it creates only a byte-sized (8-bit) result. This is perfectly fine if the function returns a bool, but doesn't work if the function returns an integer type. There are two ways to solve that. What GCC currently does if you change foo() to return int is add a MOVZBL instruction between the OR and RET: foo(int, int, int): movl12(%esp), %edx cmpl%edx, 4(%esp) sete%al cmpl8(%esp), %edx sete%dl orl %edx, %eax movzbl %al, %eax ret This zero-extends the result in AL into EAX. (Notice that the partial register stall hazard is still there.) This existing behavior could simply be maintained. However, it would be more optimal to pre-zero as shown in Approach B. (For details on why this would be more optimal on all x86 microarchitectures, see here: https://stackoverflow.com/a/33668295).
[Bug c++/81514] g++.dg/lookup/missing-std-include-2.C FAILs on Solaris
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81514 --- Comment #3 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #2 from David Malcolm --- > Candidate patch: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01858.html I've included the patch in this weekend's Solaris bootstraps and the failures are gone indeed. Thanks. Rainer
[Bug c/51515] Unable to forward declare nested functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51515 SztfG at yandex dot ru changed: What|Removed |Added CC||SztfG at yandex dot ru --- Comment #2 from SztfG at yandex dot ru --- but why this doesn't compile? void f() { typedef auto void (*func)(); func g(void); g(); func g() {} }