[Bug lto/83388] New: reference statement index not found error with -fsanitize=null
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83388 Bug ID: 83388 Summary: reference statement index not found error with -fsanitize=null Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Created attachment 42844 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42844=edit test case With the attached test case gcc8 -m32 -O2 -flto -fsanitize=null -c core.i gcc8 -r -nostdlib core.o gives In function 'i': lto1: fatal error: Reference statement index not found compilation terminated. Happens with gcc 7 and trunk
[Bug lto/83376] ICE in LTO streamer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83376 Andi Kleen changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from Andi Kleen --- Looks like it was a case of incompatible LTO object file from a different gcc build. With a clean build it doesn't happen anymore.
[Bug lto/83380] New: disk full while writing LTO files leads to ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83380 Bug ID: 83380 Summary: disk full while writing LTO files leads to ICE Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- lto1: fatal error: error writing to vmlinux.ltrans15.s: No space left on device gcc: internal compiler error: Aborted signal terminated program lto1 Please submit a full bug report, with preprocessed source if appropriate. See <https://gcc.gnu.org/bugs/> for instructions. Should just exit in this case
[Bug gcov-profile/83355] autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83355 --- Comment #3 from Andi Kleen --- patch checked in
[Bug lto/83376] New: ICE in LTO streamer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83376 Bug ID: 83376 Summary: ICE in LTO streamer Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Don't have a small test case right now, but will bisect When building Linux kernel LTO with gcc 8 I currently get an ICE. Doesn't happen on 7 and I think it's also recent on 8. In this case data_in->current_decl_data is NULL while reading a reference. 0xa58fe7 crash_signal ../../gcc/gcc/toplev.c:325 0x957a39 lto_file_decl_data_get_var_decl ../../gcc/gcc/lto-streamer.h:1210 0x957a39 lto_input_tree_ref(lto_input_block*, data_in*, function*, LTO_tags) ../../gcc/gcc/lto-streamer-in.c:366 0x957c1d lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int) ../../gcc/gcc/lto-streamer-in.c:1475 0x6bdc8c lto_read_decls ../../gcc/gcc/lto/lto.c:1791 0x6bdc8c lto_file_finalize ../../gcc/gcc/lto/lto.c:2055 0x6bdc8c lto_create_files_from_ids ../../gcc/gcc/lto/lto.c:2065 0x6bdc8c lto_file_read ../../gcc/gcc/lto/lto.c:2106 0x6bdc8c read_cgraph_and_symbols ../../gcc/gcc/lto/lto.c:2818 0x6bfdb1 lto_main() ../../gcc/gcc/lto/lto.c:3323
[Bug lto/83375] partitioner partitions static arrays with label references
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83375 --- Comment #1 from Andi Kleen --- Actually -flto-partition=max
[Bug lto/83375] New: partitioner partitions static arrays with label references
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83375 Bug ID: 83375 Summary: partitioner partitions static arrays with label references Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Created attachment 42842 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42842=edit test case I thought there was already a bug for this, but can't find it right now. When & are put into static arrays the LTO partitioner can put the static into a different partition, which causes an assembler error because the code labels are local. This breaks Linux kernel LTO builds. See attached test case. I think ipa-comdats should put the function and the static into the same partition, but for some reason it doesn't work. Attached test case shows the problem with -flto-partition=1to1 -flto -O2
[Bug gcov-profile/83355] New: autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83355 Bug ID: 83355 Summary: autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Running in gdb shows that there is a very deep recursion in get_index_by_decl until it overflows the stack. This patch seems to fix it (but not sure why the abstract origin would point to itself) diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c index 5134a795331..403709bad6b 100644 --- a/gcc/auto-profile.c +++ b/gcc/auto-profile.c @@ -477,7 +477,7 @@ string_table::get_index_by_decl (tree decl) const ret = get_index (lang_hooks.dwarf_name (decl, 0)); if (ret != -1) return ret; - if (DECL_ABSTRACT_ORIGIN (decl)) + if (DECL_ABSTRACT_ORIGIN (decl) && DECL_ABSTRACT_ORIGIN (decl) != decl) return get_index_by_decl (DECL_ABSTRACT_ORIGIN (decl)); return -1; Backtrace: Program received signal SIGSEGV, Segmentation fault. 0x016c4ab2 in pp_emit_prefix (pp=0x229b1a0 ) at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1485 1485{ (gdb) up #1 0x016c4c90 in pp_append_text(pretty_printer*, char const*, char const*) () at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1556 1556 pp_emit_prefix (pp); (gdb) bt #0 0x016c4ab2 in pp_emit_prefix (pp=0x229b1a0 ) at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1485 #1 0x016c4c90 in pp_append_text(pretty_printer*, char const*, char const*) () at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1556 #2 0x00b12c83 in pp_c_identifier (pp=0x229b1a0 , id=) at /home/andi/gcc/git/gcc/gcc/c-family/c-pretty-print.c:1203 #3 0x00992b46 in dump_decl (flags=0, t=0x76d2ce40, pp=0x229b1a0 ) at /home/andi/gcc/git/gcc/gcc/tree.h:3226 #4 dump_function_name(cxx_pretty_printer*, tree_node*, int) () at /home/andi/gcc/git/gcc/gcc/cp/error.c:1852 #5 0x009940a4 in lang_decl_name(tree_node*, int, bool) () at /home/andi/gcc/git/gcc/gcc/cp/error.c:3005 #6 0x00994133 in lang_decl_dwarf_name (decl=, v=, translate=) at /home/andi/gcc/git/gcc/gcc/cp/error.c:2977 #7 0x0156762a in autofdo::string_table::get_index_by_decl(tree_node*) const () at /home/andi/gcc/git/gcc/gcc/auto-profile.c:477
[Bug ipa/83346] inliner crash with always inline and templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83346 --- Comment #1 from Andi Kleen --- This fixes it. Don't know why that node has no decl. Will submit after a test cycle. diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c index 7846e93d119..dcd8a3de1ac 100644 --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -2391,7 +2391,8 @@ ipa_inline (void) entry of cycles, possibly cloning that entry point and try to flatten itself turning it into a self-recursive function. */ - if (lookup_attribute ("flatten", + if (node->decl +&& lookup_attribute ("flatten", DECL_ATTRIBUTES (node->decl)) != NULL) { if (dump_file)
[Bug ipa/83346] New: inliner crash with always inline and templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83346 Bug ID: 83346 Summary: inliner crash with always inline and templates Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Created attachment 42820 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42820=edit test case Attached test case segfaults with -O2 on gcc 7 and 8 trunk g++ -O2 -S ch-crash.i ch-crash.i:30:1: internal compiler error: Segmentation fault } ^ 0xc030f7 crash_signal ../../gcc/gcc/toplev.c:325 0x125b189 ipa_inline ../../gcc/gcc/ipa-inline.c:2388 0x125b189 execute ../../gcc/gcc/ipa-inline.c:2807
[Bug target/83052] [8 Regression] ICE in extract_insn, at recog.c:2305 starting from r254560
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83052 --- Comment #1 from Andi Kleen --- I'm not sure why you call it a regression? You must be running the test suite manually with the new option. I haven't tested, but likely it will fail if you run that test with -mcmodel=large too. The -mforce-indirect-call patch is really only a subset of -mcmodel=large. Then it would be more a latent bug.
[Bug tree-optimization/82854] New: more missing simplifcations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82854 Bug ID: 82854 Summary: more missing simplifcations Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- These all come from a paper "Optgen: A Generator for Local Optimizations" (Buchwald et.al.). https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald15cc.pdf These were found by a SAT solver. I wrote them in partial pseudo match.pd syntax (untested, likely buggy) I'm not sure how useful they are really for real programs, but with the auto generated matchers scaling well to more rules they wouldn't hurt I suppose. /* x + (x & 0x8000) -> x & 0x7fff */ (simplify (plus:c @0 (bit_and @0 integer_msb_onlyp@1)) (bit_and @0 { @1 - 1; } )) /* (x | 0x8000) + 0x8000 -> x & 0x7FFF */ (simplify (plus:c (bit_ior @0 integer_msb_onlyp) msb_setp) (bit_and @0 { msb_minus_one_val(type); } )) /* x & (x + 0x8000) -> x & 0x7FFF */ (simplify (bit_and:c (plus @0 msb_setp) @0) (bit_and @0 { msb_minus_one_val(type); } )) /* x & (0x7FFF - x) -> x & 0x8000 */ (simplify (bit_and:c @0 (minus msb_minus_onep @0)) (bit_and @0 { msb_val(type); } )) /* is_power_of_2(c1) && c0 & (2 * c1 - 1) == c1 - 1 -> (c0 - x) & c1 -> x & c1 */ /* x | (x + 0x8000) -> x | 0x8000 */ (simplify (bit_ior:c @0 (plus @0 msb_onlyp)) (bit_ior @0 { msb_val(type); } )) /* x | (0x7FFF - x) -> x | 0x7FFF */ (simplify (bit_ior:c @0 (minus 0x7FFF @0)) (bit_ior @0 0x7FFF)) /* x | (x ^ y) -> x | y */ (simplify (bit_ior:c @0 (bit_xor:c @0 @1)) (bit_ior @0 @1)) /* ((c0 | -c0) & ∼c1) == 0 AND (x + c0) | c1 -> x | c1 */ /* is_power_of_2(∼c1) && c0 & (2 * ∼c1 - 1) == ∼c1 - 1 AND (c0 - x) | c1 -> x | c1 */ /* -x | 0xFFFE -> x | 0xFFFE */ (simplify (bit_or (negate @0) 0xFFFE) (bit_or @0 0xFFFE)) /* 0 - (x & 0x8000) -> x & 0x8000 */ (simplify (minus 0 (bit_and:c @0 0x8000)) (bit_and @0 0x8000)) /* 0x7FFF - (x & 0x8000) -> x | 0x7FFF */ (simplify (minus 0x7FFF (bit_and @0 0x8000)) (bit_ior @0 0x7FFF)) /* 0x7FFF - (x | 0x7FFF) -> x & 0x8000 */ (simplify (minus 0x7FFF (bit_ior:c @0 0x7FFF)) (bit_and @0 0x8000)) /* 0xFFFE - (x | 0x7FFF) -> x | 0x7FFF */ (simplify (minus 0xFFFE (bit_ior:c @0 0x7FFF)) (bit_ior @0 0x7FFF)) /* (x & 0x7FFF) - x -> x & 0x8000 */ (simplify (minus (bit_and:c @0 0x7FFF) @0) (bit_and @0 0x8000)) /* x ^ (x + 0x8000) -> 0x8000 */ (simplify (bit_xor:c (plus:c @0 0x8000)) 0x8000) /* x ^ (0x7FFF - x) -> 0x7FFF */ (simplify (bit_xor:c @0 (minus 0x7FFF @0)) 0x7FFF) /* (x + 0x7FFF) ^ 0x7FFF -> -x */ (simplify (bit_xor:c (plus:c @0 0x7FFF) 0x7FFF) (negate @0)) /* -x ^ 0x8000 -> 0x8000 - x */ (simplify (bit_xor:c (negate @0) 0x8000) (minus 0x8000 @0)) /* (0x7FFF - x) ^ 0x7FFF -> x */ (simplify (bit_xor:c (minus 0x7FFF @0) 0x7FFF) @0) /* ~(x + c) -> ~c - x */ (simplify (bit_not (plus:c @0 CONSTANT_CLASS_P@1)) (minus (bit_not c) @0)) /* -x ^ 0x7FFF -> x + 0x7FFF */ (simplify (bit_xor (negate @0) 0x7FFF) (plus @0 0x7FFF)) /* (x | c) - c -> x & ∼c */ (simplify (minus (bit_ior @0 CONSTANT_CLASS_P@1) @1) (bit_and @0 (bit_not @1))) /* ~(c - x) -> x + ∼c */ (simplify (bit_not (minus CONSTANT_CLASS_P@0 @1)) (plus @1 (bit_not @0))) /* -c0 == c1 AND (x | c0) + c1 -> x & ∼c1 */ (simplify (plus (bit_or @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2) (if (...) (bit_and @0 (bit_not @2)) /* (c0 & ∼c1) == 0 AND (x ^ c0) | c1 -> x | c1 */ /* 0x7FFF - (x ^ c) -> x ^ (0x7FFF - c) */
[Bug tree-optimization/82854] more missing simplifcations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82854 --- Comment #1 from Andi Kleen --- Also I suppose a lot of them could be generalized to 8/16/64bit.
[Bug middle-end/82853] Optimize x % 3 == 0 without modulo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 --- Comment #8 from Andi Kleen --- I'm not sure if it works with other numbers too. (need to dig through Hacker's delight & Matters Computational to see if they have anything on it) But it could be extended for other word lengths at least BTW there are some other cases, will file a bug shortly on those too.
[Bug middle-end/82853] Optimize x % 3 == 0 without modulo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 --- Comment #5 from Andi Kleen --- Also I'm not sure why you would want it in the middle end. It should all work at the tree level
[Bug middle-end/82853] Optimize x % 3 == 0 without modulo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 --- Comment #4 from Andi Kleen --- Right it's about special casing the complete expression
[Bug tree-optimization/82853] New: Optimize x % 3 == 0 without modulo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 Bug ID: 82853 Summary: Optimize x % 3 == 0 without modulo Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Ralph Levien pointed out as part of FizzBuzz optimization: Turns out you can compute x%3 == 0 with even fewer steps, it's (x*0xb) < 0x5556 (assuming wrapping unsigned 32 bit arithmetic). gcc currently generates the full modulo and then checks. Could be done in match.pd I suppose. Test case unsigned mod3(unsigned a) { return 0==(a%3); }
[Bug other/82784] Remove semicolon after "do {} while (0)" macros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82784 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #5 from Andi Kleen --- Sounds like a good candidate for a new warning
[Bug c/82013] New: better error message for missing semicolon in prototype
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82013 Bug ID: 82013 Summary: better error message for missing semicolon in prototype Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- gcc gives quite poor error messages when forgetting a semicolon after a prototype (common mistake when cut'n'pasting a function definition into a header) It's especially confusing when the prototype is the last in the include file, because then the errors appear in another file. As a minimum it should warn about a missing semicolon at the end of a file. Possibly this could be also used for fix-it, but that's likely more complicated.
[Bug target/80742] New: attribute target no- does not work
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80742 Bug ID: 80742 Summary: attribute target no- does not work Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Disabling ISAs with attribute target doesn't seem to work on x86_64 e.g. typedef float __m128 __attribute__ ((vector_size (16))); __attribute__((target("no-sse2"))) __m128 func (__m128 x, __m128 y) { __m128 xmm0 = x, xmm1 = y, xmm2; xmm0 = __builtin_ia32_xorps (xmm1, xmm1); return xmm0; } does not error out.
[Bug testsuite/79067] gcc.dg/tree-prof/cold_partition_label.c runs a million times longer than it used to and times out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79067 --- Comment #3 from Andi Kleen --- sandra, does this patch fix it? diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c index 6214e3629f2..924a270e1bd 100644 --- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c +++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c @@ -2,6 +2,7 @@ gets a label. */ /* { dg-require-effective-target freorder } */ /* { dg-options "-O2 -freorder-blocks-and-partition -save-temps" } */ +/* { dg-require-profiling "-fprofile-generate" } */ #define SIZE 1
[Bug testsuite/79067] gcc.dg/tree-prof/cold_partition_label.c runs a million times longer than it used to and times out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79067 --- Comment #2 from Andi Kleen --- There's a separate fix for the random failures (or w/a increase /proc/sys/kernel/perf_event_mlock_kb), see PR 77684 Not running the test on systems without FDO seems best. I don't think it does anything useful there anyways.
[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684 --- Comment #5 from Andi Kleen --- Created attachment 41337 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41337=edit limit perf buffer size This patch allows parallelism upto 16 with the default setting. Currently testing
[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684 --- Comment #4 from Andi Kleen --- Thanks for tracing that down. So perf runs out of memory for the locked trace buffers Increasing the limit is a good workaround ulimit -l may also work, but also needs root. We could just pass a smaller -m value to perf Does it work when you change the last line in config/i386/gcc-auto-profile to add -m 128k (or possibly other values, have to be power of two)
[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 --- Comment #8 from Andi Kleen --- __builtin_constant_p does not cover variable range information, which is what we're looking for here to prevent security bugs. Also in my experience these explicit expressions tend to be somewhat fragile and is not well specified. It has to assume that the optimizer does specific operations which are nowhere guaranteed. An explicit builtin could be much tighter defined.
[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 --- Comment #6 from Andi Kleen --- In the kernel there is also an upper limit on allocations. Perhaps just a generic assert builtin that: - uses value range information - uses constant propagation - is a nop when the compiler doesn't have either of this available - otherwise warns at build time __builtin_compile_assert(size >= 0 && size < MAX_ALLOC_SIZE);
[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 --- Comment #4 from Andi Kleen --- I tested it now and the inline trick doesn't work. Here's a test case extern void *do_alloc(int a, int b); static inline __attribute__((alloc_size(1))) void check_alloc_size(int size) { } static inline void *alloc(int a, int b) { check_alloc_size(a + b); return do_alloc(a, b); } void func(void) { alloc(-1, 0); }
[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 --- Comment #3 from Andi Kleen --- Hmm, that trick may work for the shift too. Let me try.
[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 --- Comment #1 from Andi Kleen --- Small correction: argument 4 would need to be a constant for shifted by.
[Bug lto/80379] New: Redundant note: code may be misoptimized unless -fno-strict-aliasing is used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80379 Bug ID: 80379 Summary: Redundant note: code may be misoptimized unless -fno-strict-aliasing is used Product: gcc Version: 6.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- I get an extra note: code may be misoptimized unless -fno-strict-aliasing is used note for type mismatches in LTO builds. But -fno-strict-aliasing is already set. In this case the extra note is pointless and should be suppressed.
[Bug c/80378] New: Extend alloc_size attribute for better Linux kernel checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378 Bug ID: 80378 Summary: Extend alloc_size attribute for better Linux kernel checking Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- I've been adding alloc_size attributes to the Linux kernel allocators. However there are some allocator patterns that can currently not be correctly described. It would be nice if the attribute could be extended with more parameters to handle this. One is void *alloc(int size_a, int size_b) where the allocation size is size_a + size_b The other is void *alloc_order(int order) where the allocation size is constant << order This could be handled by two extra parameters to alloc_size, one to give a sum argument and another to to give a shifted by argument. The arguments 2,3 would also need to support a "ignore" parameter (e.g. -1)
[Bug lto/60016] gcc-nm does not report static symbols
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60016 --- Comment #2 from Andi Kleen --- This is needed for example to generate backtraces, if the symbol table should be built in instead of read from the binary. The Linux kernel cannot read its own binary, so the symbol table has to built in.
[Bug gcov-profile/71672] New: inlining indirect calls does not work with autofdo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71672 Bug ID: 71672 Summary: inlining indirect calls does not work with autofdo Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- The current mainline version of autofdo doesn't inline indirect calls based on profiling data. I instrumented a bootstrap and it never triggers. gcc.dg/tree-prof/indir-call-prof.c also fails (needs the patch kit in https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01786.html applied first). I did some debugging and it seems to give up in update_inlined_ind_target() here 772 /* Program behavior changed, original promoted (and inlined) target is not 773 hot any more. Will avoid promote the original target. 774 775 To check if original promoted target is still hot, we check the total 776 count of the unpromoted targets (stored in old_info). If it is no less 777 than half of the callsite count (stored in INFO), the original promoted 778 target is considered not hot any more. */ 779 if (total >= info->count / 2) but even with the test commented out it doesn't work.
[Bug target/71659] New: _xgetbv intrinsic missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71659 Bug ID: 71659 Summary: _xgetbv intrinsic missing Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- icc and microsoft have a _xgetbv intrinsic for the XGETBV instruction, which is needed to check if AVX or MPX are supported by the kernel. gcc is missing an intrinsic for that, so everyone has to write inline assembler. Should add one.
[Bug c/70618] New: better error messages for missing/too many arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70618 Bug ID: 70618 Summary: better error messages for missing/too many arguments Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- When doing API refactorings it is reasonable common to have too many or not enough arguments in function calls. The existing errors in gcc/g++ are not very good for that, i get at least two consecutive ones and they are not very clear. Since that is common it would be much better if the compiler could compute the minimum edit distance to the real prototype (or the nearest for C++) and then directl ysuggest what arguments are missing or which are too many. void foo(int *xp, float *yp, double *zp) { } int x; float y; double z; short k; void f2(void) { foo(, );/* forgot x */ foo(, );/* forgot y */ foo(, );/* forgot z */ foo();/* forgot y and z */ foo();/* forgot x and y*/ foo(, , , );/* x too many at end */ foo(, , , );/* x too man at start */ foo(, , , );/* y too much in the middle */ foo(, , , );/* different y in middle */ foo(, , , );/* different x at start */ foo(, , , );/* different x at end */ } gcc/tsrc/tmissing.c: In function ‘f2’: gcc/tsrc/tmissing.c:14:6: warning: passing argument 1 of ‘foo’ from incompatible pointer type [-Wincompatible-pointer-types] foo(, ); /* forgot x */ ^ gcc/tsrc/tmissing.c:3:6: note: expected ‘int *’ but argument is of type ‘float *’ void foo(int *xp, float *yp, double *zp) ^ gcc/tsrc/tmissing.c:14:10: warning: passing argument 2 of ‘foo’ from incompatible pointer type [-Wincompatible-pointer-types] foo(, ); /* forgot x */ ^ gcc/tsrc/tmissing.c:3:6: note: expected ‘float *’ but argument is of type ‘double *’ void foo(int *xp, float *yp, double *zp) ^ gcc/tsrc/tmissing.c:14:2: error: too few arguments to function ‘foo’ foo(, ); /* forgot x */ ^ gcc/tsrc/tmissing.c:3:6: note: declared here void foo(int *xp, float *yp, double *zp) ^ gcc/tsrc/tmissing.c:15:10: warning: passing argument 2 of ‘foo’ from incompatible pointer type [-Wincompatible-pointer-types] foo(, ); /* forgot y */ ^ gcc/tsrc/tmissing.c:3:6: note: expected ‘float *’ but argument is of type ‘double *’ void foo(int *xp, float *yp, double *zp) ^ gcc/tsrc/tmissing.c:15:2: error: too few arguments to function ‘foo’ foo(, ); /* forgot y */
[Bug tree-optimization/70427] autofdo bootstrap generates wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427 --- Comment #3 from Andi Kleen --- Analyzing the code more it looks like the compiler generates it correctly, the edge returned should not be 0 here.
[Bug tree-optimization/70427] autofdo bootstrap generates wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427 --- Comment #2 from Andi Kleen --- Created attachment 38110 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38110=edit somewhat reduced input file, only single function
[Bug tree-optimization/70427] autofdo bootstrap generates wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427 --- Comment #1 from Andi Kleen --- Created attachment 38109 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38109=edit ipa-profile input Here's the source of the miscompiled file from the compiler cc1plus -O2 ipa-profile.i -S unfortunately have to inspect assembler to see the miscompilation: look for ipa_generate_profile_summary then look for get_edge call_ZN11cgraph_node8get_edgeEP6gimple testq %rax, %rax movq%rax, %r15 je .L836< jump if rax/r15 is 0 testb $2, 96(%rax) je .L837 .L836: <--- it can be here movq16(%r12), %rax movq64(%r15), %rsi <-- BAD same miscompilation here (just with another register). r15 is referenced after being tested for NULL.
[Bug tree-optimization/70427] New: autofdo bootstrap generates wrong code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427 Bug ID: 70427 Summary: autofdo bootstrap generates wrong code Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- I've been working on building gcc with an autofdo bootstrap. Currently I always run into an crash while rebuilding tree.c with the stage2 compiler and the autofdo information Looking at the code it is clearly miscompiled in ipa_profile_generate_summary: struct cgraph_edge * e = node->get_edge (stmt); if (e && !e->indirect_unknown_callee) continue; 0x0093bb16 <+326>: callq 0x7be530 <_ZN11cgraph_node8get_edgeEP6gimple> 0x0093bb1b <+331>: test %rax,%rax # check for NUULL 0x0093bb1e <+334>: mov%rax,%r8 0x0093bb21 <+337>: je 0x93bb2d <_ZL28ipa_profile_generate_summaryv+349> 0x0093bb23 <+339>: testb $0x2,0x60(%rax) 0x0093bb27 <+343>: je 0x93baa7 <_ZL28ipa_profile_generate_summaryv+215> 0x0093bb2d <+349>: mov0x10(%r13),%rax # go here because of NULL => 0x0093bb31 <+353>: mov0x40(%r8),%rsi # but we still reference! (gdb) p $r8 $4 = 0 The crash is on bb31 because r8 is NULL. The code checked the return value of the call, but then references it afterwards before doing the continue. Command line option: cc1plus -fauto-profile=cc1plus.fda -g -O2 tree.i cc1plus.fda is at http://halobates.de/cc1plus.fda (too big to attach)
[Bug c/28901] -Wunused-variable ignores unused const initialised variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28901 --- Comment #17 from Andi Kleen --- There were a few false or useless ones (e.g. related to macros and specific build configs). I didn't look through them all, but various were semi legitimate, but also very minor (small) so fixing it won't help much. I think one or two of the ones I looked at may have been real bugs. I still think the warning should not be in -Wall. thousand+ warnings in real projects is just not acceptable.
[Bug c/28901] -Wunused-variable ignores unused const initialised variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28901 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #14 from Andi Kleen --- I'm building a current Linux kernel with allyesconfig, and this new warning causes 1383(!) new warnings in the build. I think this should be revisited and the warning be turned off again.
[Bug target/68602] New: i386: -mtune/arch options not all output by -v --help
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68602 Bug ID: 68602 Summary: i386: -mtune/arch options not all output by -v --help Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- gcc -v --help does not output all the possible options for -mtune=/-march= For example corei7-avx is missing for arch, which is Sandy Bridge. tune is also mising all cpu names -march=CPU[,+EXTENSION...] generate code for CPU and EXTENSION, CPU is one of: generic32, generic64, i386, i486, i586, i686, pentium, pentiumpro, pentiumii, pentiumiii, pentium4, prescott, nocona, core, core2, corei7, l1om, k1om, k6, k6_2, athlon, opteron, k8, amdfam10, bdver1, bdver2, bdver3, bdver4, btver1, btver2 EXTENSION is combination of: 8087, 287, 387, no87, mmx, nommx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, sse4, nosse, avx, avx2, avx512f, avx512cd, avx512er, avx512pf, avx512dq, avx512bw, avx512vl, noavx, vmx, vmfunc, smx, xsave, xsaveopt, xsavec, xsaves, aes, pclmul, fsgsbase, rdrnd, f16c, bmi2, fma, fma4, xop, lwp, movbe, cx16, ept, lzcnt, hle, rtm, invpcid, clflush, nop, syscall, rdtscp, 3dnow, 3dnowa, padlock, svme, sse4a, abm, bmi, tbm, adx, rdseed, prfchw, smap, mpx, sha, clflushopt, prefetchwt1, se1, clwb, pcommit, avx512ifma, avx512vbmi -mtune=CPU optimize for CPU, CPU is one of: generic32, generic64, i8086, i186, i286, i386, i486, i586, i686, pentium, pentiumpro, pentiumii, pentiumiii, pentium4, prescott, nocona, core, core2, corei7, l1om, k1om, k6, k6_2, athlon, opteron, k8, amdfam10, bdver1, bdver2, bdver3, bdver4, btver1, btver2
[Bug lto/66229] LTO fails with -fauto-profile on mcf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66229 --- Comment #2 from Andi Kleen --- Some analysis of the problem: At the time cc1 is streaming out profile_data it is not set to anything in autofdo. So the LTO files contain all 0 profile data, which later causes the ICE here. Seems to be some kind of ordering problem. Strangely the autofdo pass gets executed in the frontend run, but for unknown reasons the profile data doesn't survive until the LTO data is written.
[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946 Andi Kleen changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #10 from Andi Kleen --- Turned out to be a binutils issue with an old binutils
[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946 --- Comment #9 from Andi Kleen --- Created attachment 36391 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36391=edit workaround This workaround fixes it. Disable -gc-section for libstdc++. It seems like a linker bug. I opened a binutils bug report https://sourceware.org/bugzilla/show_bug.cgi?id=19008
[Bug lto/50676] Partitioning may fail with presence of static variables referring to function labels
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50676 --- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org --- The patch doesn't seem to be checked in yet. Is there a reason for that?
[Bug rtl-optimization/66890] function splitting only works with profile feedback
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890 --- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org --- Created attachment 36008 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36008action=edit Updated patch with documentation and param I updated the patch with proper documentation and a param for the cut off. In some tests it appears to do the right thing when building a Linux kernel.
[Bug rtl-optimization/66890] function splitting only works with profile feedback
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890 --- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org --- I suspect the patch may be too simple because it could get stuck in unlikely, but high frequency edges in the cold area. Perhaps need to adapt more of the code of the non partitioning reordering
[Bug rtl-optimization/66890] function splitting only works with profile feedback
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890 --- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org --- Created attachment 35993 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35993action=edit Potential patch This patch fixes the problem for my simple test case. It adds a fall back path to the partition check: if no profile information is available only edges are checked and everything that has only 20% frequency or less incoming edges is considered cold. 20% is fairly arbitrary, likely needs tuning and should be a param. But seems to work for the test case. Comments?
[Bug rtl-optimization/66890] function splitting only works with profile feedback
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890 --- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org --- The problem seems to be that bb-reorder.c:find_rarely_executed_basic_blocks_and_crossing_edges returns no edges without profile feedback, which prevents generation of a section split note.
[Bug rtl-optimization/66890] New: function splitting only works with profile feedback
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890 Bug ID: 66890 Summary: function splitting only works with profile feedback Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Consider this simple example: volatile int count; int main() { int i; for (i = 0; i 10; i++) { if (i == 999) count *= 2; count++; } } The default EQ is unlikely heuristic in predict.* predicts that the if (i == 999) is unlikely. So the tracer moves the count *= 2 basic block out of line to preserve instruction cache. gcc50 -O2 -S thotcold.c movl$1, %edx jmp .L2 .p2align 4,,10 .p2align 3 .L4: addl$1, %edx .L2: cmpl$1000, %edx movlcount(%rip), %eax je .L6 addl$1, %eax cmpl$10, %edx movl%eax, count(%rip) jne .L4 xorl%eax, %eax ret # out of line code .L6: addl%eax, %eax movl%eax, count(%rip) movlcount(%rip), %eax addl$1, %eax movl%eax, count(%rip) jmp .L4 Now if we enable -freorder-blocks-and-partition I would expect it to be also put into .text.unlikely to given even better cache layout. But that's what is not happening. It generates the same code. Only when I use actual profile feedback and -freorder-blocks-and-partition the code actually ends up being in a separate section (it also unrolled the loop, so the code looks a bit different) gcc -O2 -fprofile-generate -freorder-blocks-and-partition thotcold.c ./a.out gcc -O2 -fprofile-use -freorder-blocks-and-partition thotcold.c ... .cfi_endproc .section.text.unlikely .cfi_startproc .L55: movlcount(%rip), %ecx addl$1, %eax addl$1, %ecx cmpl$10, %eax movl%ecx, count(%rip) je .L6 cmpl$1, %edx je .L5 cmpl$2, %edx je .L28 cmpl$3, %edx -freorder-blocks-and-partition should already use the extra section even without profile feedback. I tested some larger programs and without profile feedback the unlikely section is always empty. The heuristics in predict.* often work quite well and a lot of code would benefit from moving cold code out of the way of the caches. This would allow to use the option to improve frontend bound codes without needing to do full profile feedback.
[Bug lto/61635] LTO partitioner does not handle label in statics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61635 --- Comment #7 from Andi Kleen andi-gcc at firstfloor dot org --- Still happens with current trunk and with newer LTO Linux kernels (4.0-rc*)
[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946 --- Comment #8 from Andi Kleen andi-gcc at firstfloor dot org --- I still get that one with current trunk on my fedora 21 system.
[Bug c/65620] New: Incorrect warning for !! with -Wlogical-not-parentheses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65620 Bug ID: 65620 Summary: Incorrect warning for !! with -Wlogical-not-parentheses Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Created attachment 35172 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35172action=edit test case When building the linux 4.0-rc5 kernel with 5.0 there are several imho bogus warnings like warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses] for constructs like this: !!test_bit(...) != ... The warning shouldn't warn for !! which is reasonably common. Looking at the c/cp parsers there is already code to check for this, but it doesn't seem to work here. In the kernel case test_bit actually expands to a complex macro like if (usage-type == 0x01 !!(__builtin_constant_p((usage-code)) ? constant_test_bit((usage-code), (input-key)) : variable_test_bit((usage-key), (input-key))) I'm attaching an (already delta'ed but still quite big) test case C++ likely has the same problem (but not tested)
[Bug bootstrap/65621] New: boot strap with checking enabled ICEs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65621 Bug ID: 65621 Summary: boot strap with checking enabled ICEs Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org target: x86_64-linux ../../../../gcc/libstdc++-v3/libsupc++/tinfo.cc:82:1: internal compiler error: in mark_functions_to_output, at cgraphunit.c:1307 } ^ 0xb25f0b mark_functions_to_output ../../gcc/gcc/cgraphunit.c:1302 0xb29137 symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2330 0xb29313 symbol_table::finalize_compilation_unit() ../../gcc/gcc/cgraphunit.c:2444 0x884c9a cp_write_global_declarations() ../../gcc/gcc/cp/decl2.c:4755
[Bug bootstrap/65621] boot strap with checking enabled ICEs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65621 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org --- Never mind. Was caused by a local modification.
[Bug ipa/64963] IPA Cloning/Splitting does not copy function section attributes resulting in kernel miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64963 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org --- In theory the kernel could mark __init functions with noclone. But I think sticky behavior would be better. That's the behavior that the kernel expects. There isn't any code as far as I know that would expect only a single function per section.
[Bug ipa/64963] [5 Regression] IPA Cloning/Splitting does not copy function section attributes resulting in kernel miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64963 --- Comment #10 from Andi Kleen andi-gcc at firstfloor dot org --- Yes it has to be fixed. For example with the kernel __kprobes attribute it could cause a real bug (__kprobes marks function that cannot be safely instrumented) We shouldn't inline over different section names either, this could also cause problems for the same reason.
[Bug tree-optimization/64130] New: vrp: handle non zero constant divided by range cannot be zero.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64130 Bug ID: 64130 Summary: vrp: handle non zero constant divided by range cannot be zero. Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org The following two functions should always be optimized to return 0 because x 0, x / a cannot be 0. But VRP misses this case for unknown reasons, even though it has some code for it in ranges_from_anti_range() int fsigned(int a) { return 100 / a == 0; } int funsigned(unsigned a) { return 100 / a == 0; } gcc50 -fno-non-call-exceptions -O2 -S tvrpdiv.c gcc version 5.0.0 2014 (experimental) (GCC) movl$100, %eax cltd idivl %edi testl %eax, %eax sete%al movzbl %al, %eax ret xorl%edx, %edx movl$100, %eax divl%edi testl %eax, %eax sete%al movzbl %al, %eax
[Bug tree-optimization/64130] vrp: handle non zero constant divided by range cannot be zero.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64130 --- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org --- You're right. I actually meant x = maxval(typeof(a)), x / a cannot be 0. Corrected test case (assuming 64bit target): #include limits.h int fsigned(int a) { return 0x1fffL / a == 0; } int funsigned(unsigned a) { return 0x1fffL / a == 0; } So this should be optimized to a 100 instead. Yes this would make sense too.
[Bug tree-optimization/63844] [4.8/4.9/5 Regression] open mp parallelization prevents vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63844 --- Comment #12 from Andi Kleen andi-gcc at firstfloor dot org --- Yes should have been omp parallel for
[Bug tree-optimization/63844] [4.8/4.9/5 Regression] open mp parallelization prevents vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63844 --- Comment #13 from Andi Kleen andi-gcc at firstfloor dot org --- I think aggregate IPA-CP does that, IPA-SRA cannot as the function has its address taken. Perhaps that case (only passing address to gomp runtime) could be special cased in the escape analysis.
[Bug bootstrap/63933] Build stage1 with -O2 during bootstrap if host compiler is a recent gcc version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63933 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org --- Perhaps using -Og (or -O1) if available? I actually like to use unoptimized stage1 gcc to debug things with gdb, The last time I checked the worst offenders were some of the C++ inlines not getting inlined, and especially the new RTL code very heavily relies on that. Perhaps just #define inline __attribute__((always_inline)) inline for stage1 would be good enough to fix the worst.
[Bug tree-optimization/63844] open mp parallelization prevents vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63844 --- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org --- Regression, doesn't happen on 4.8
[Bug tree-optimization/63844] [4.8/4.9/5 Regression] open mp parallelization prevents vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63844 --- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org --- I had a typo in the test case (remove += to make the loops identical) #define N 1000 int a[N], b[N], c[N]; main() { int i; #pragma omp parallel num_threads(4) for (i = 0; i N; i++) { a[i] = b[i] + c[i]; } for (i = 0; i N; i++) { a[i] = b[i] + c[i]; } } The case I saw vectorized on 4.8 (opensuse 13.1 compiler), but not on 5.0, was slightly different, auto parallelized #define N 1000 int a[N], b[N], c[N]; main() { int i; for (i = 0; i N; i++) { a[i] = b[i] + c[i]; } } With -O3 -mtree-parallelize-loops=4 I understand this will just internally generate openmp
[Bug tree-optimization/63844] New: open mp parallelization prevents vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63844 Bug ID: 63844 Summary: open mp parallelization prevents vectorization Product: gcc Version: 4.9.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org #define N 1000 int a[N], b[N], c[N]; main() { int i; #pragma omp parallel num_threads(4) for (i = 0; i N; i++) { a[i] = b[i] + c[i]; } for (i = 0; i N; i++) { a[i] += b[i] + c[i]; } } compiled with gcc -O3 -fopenmp The first loop gets parallelized by openmp, the second loop gets vectorized. But why does the parallelized loop not get vectorized too?
[Bug c/60804] Another CilkPlus ICE in gimplify_expr, at gimplify.c:8335
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60804 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #12 from Andi Kleen andi-gcc at firstfloor dot org --- Should be all fixed now in mainline.
[Bug target/63672] New: xbegin/xend/xabort missing memory barriers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63672 Bug ID: 63672 Summary: xbegin/xend/xabort missing memory barriers Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Created attachment 33835 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33835action=edit proposed patch adding barriers No test case currently, but we got a report that the builtins for x86 RTM xbegin/xend/xabort are missing implicit memory barriers. This can cause code to be moved outside the critical sections, breaking the program.
[Bug middle-end/63556] New: gcc should dedup string postfixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63556 Bug ID: 63556 Summary: gcc should dedup string postfixes Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org With this code: extern void func(char *a, char *b); void f(void) { func(abc, xabc); func(abc, abc); } we get: .LC0: .string xabc .LC1: .string abc So the abcs get deduped. But it could also dedup the postfix by pointing abc to xabc + 1. This would save some space.
[Bug middle-end/63556] gcc should dedup string postfixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63556 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added Severity|normal |enhancement
[Bug c/63543] New: incomplete type error should suppress duplicates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63543 Bug ID: 63543 Summary: incomplete type error should suppress duplicates Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org For a test case like this struct undefined; int f(struct undefined *f) { int x = f-a; return x + f-a + f-b; } tmissing-type.c: In function 'f': tmissing-type.c:5:11: error: dereferencing pointer to incomplete type int x = f-a; ^ tmissing-type.c:6:14: error: dereferencing pointer to incomplete type return x + f-a + f-b; ^ tmissing-type.c:6:21: error: dereferencing pointer to incomplete type return x + f-a + f-b; gcc outputs three different errors for each reference of the undefined type. It would be better if it remembered that it already gave an error for referencing that type and suppress the later errors (similar to undefined symbols). This would avoid cascading errors.
[Bug lto/61969] [4.8/4.9/5 Regression] wrong code by LTO on i?86-linux-gnu (affecting trunk, 4.9.x, and 4.8.x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61969 --- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org --- I looked at this a bit more. It's definitely the nrv pass that causes the problem. When I disable it in the source code the 32bit version compiles correctly. I also tried disabling the next pass (cfgcleanup), but that didn't make a difference. It converts the local variable to be a value-expr. It's still not exactly clear who deletes the variable declaration though. There are two possibilities: - nrv shouldn't convert the variable in the first place - someone who messes with the variables forgets to check for value-exprs. ;; Function func_52 (func_52, funcdef_no=86, decl_uid=2858, cgraph_uid=54, symbol_order=1152) NRV Replaced: l_55 with: retval func_52 (uint32_t p_53) { extern const struct S0 l_55 = {.f0=4, .f1=40290, .f2=10, .f3=4} [value-expr: retval]; bb 2: return retval; }
[Bug lto/61969] [4.8/4.9/5 Regression] wrong code by LTO on i?86-linux-gnu (affecting trunk, 4.9.x, and 4.8.x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61969 --- Comment #9 from Andi Kleen andi-gcc at firstfloor dot org --- Patch fixes the test case.
[Bug c/63462] [RFC] gcc should prevent from overwriting source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63462 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org --- Agreed this would be a useful feature. Happened to me at least once too.
[Bug libstdc++/63466] New: sstream is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63466 Bug ID: 63466 Summary: sstream is very slow Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org sstream is very slow. Comparing two simple programs that parse a stream with C and with sstream. The sstream version is an order of magnitude slower. gcc version 4.9.1 20140423 (prerelease) (GCC) # C++ % time ./a.out testfile real0m0.893s user0m0.888s sys0m0.004s # C time ./tstream-c testfile real0m0.032s user0m0.030s sys0m0.001s Here's a profile. 16.13%a.out libc-2.18.so [.] _IO_getc 10.39%a.out libc-2.18.so [.] _IO_ungetc 9.15%a.out libstdc++.so.6.0.20 [.] std::basic_istreamchar, std::char_traitschar std::getlinechar, std::char_traitschar, std::allocatorchar (std::basic_istreamchar, std::char_traitschar , std::basic_stringchar, std::char_traitschar, std::allocatorchar , char) 7.87%a.out libstdc++.so.6.0.20 [.] __dynamic_cast 4.99%a.out libc-2.18.so [.] __GI___strcmp_ssse3 3.95%a.out libstdc++.so.6.0.20 [.] std::basic_istreamchar, std::char_traitschar std::operatorchar, std::char_traitschar, std::allocatorchar (std::basic_istreamchar, std::char_traitschar , std::basic_stringchar, std::char_traitschar, std::allocatorchar ) 3.89%a.out libc-2.18.so [.] _int_free 2.79%a.out libstdc++.so.6.0.20 [.] __cxxabiv1::__vmi_class_type_info::__do_dyncast(long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info::__dyncast_result) const 2.65%a.out a.out[.] main 2.58%a.out libc-2.18.so [.] malloc 2.30%a.out libstdc++.so.6.0.20 [.] __cxxabiv1::__si_class_type_info::__do_dyncast(long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info::__dyncast_result) const 1.96%a.out libc-2.18.so [.] _int_malloc 1.86%a.out libstdc++.so.6.0.20 [.] std::istream::sentry::sentry(std::istream, bool) 1.55%a.out libc-2.18.so [.] _IO_sputbackc 1.51%a.out libstdc++.so.6.0.20 [.] __gnu_cxx::stdio_sync_filebufchar, std::char_traitschar ::underflow() Test case: Generate test file: perl -e 'for($i=0;$i100;$i++) { printf(%d %d\n, $i, $i); } ' testfile C++ version: #include iostream #include string #include sstream using namespace std; void __attribute__((noinline, noclone)) func(string , string ) { } int main() { string line; while (getline(cin, line)) { istringstream iss(line); string index, s; if (!(iss index s)) continue; func(index, s); } return 0; } C version: #define _GNU_SOURCE 1 #include stdio.h #include string.h void __attribute__((noinline, noclone)) func(char *a, char *b) { } int main() { char *line = NULL; size_t linelen = 0; while (getline(line, linelen, stdin) 0) { char *p = line; char *a = strsep(p, \t\n); char *b = strsep(p, \t\n); func(a, b); } return 0; }
[Bug tree-optimization/63467] New: should have asm statement that does not prevent vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63467 Bug ID: 63467 Summary: should have asm statement that does not prevent vectorization Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Currently any inline asm statement in a loop prevents vectorization, like #define N 100 int a[N], b[N], c[N]; main() { int i; for (i = 0; i N; i++) { asm(); a[i] = b[i] + c[i]; } } Without the asm the loop vectorizes fine. This is a problem if you want to add markers into the loop body for static assembler code analysis (for example with IACA, https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) Should have some way to tell the compiler that a particular inline asm statement does not have any side effects that prevent vectorization or other loop transformations. Perhaps an asm const ?
[Bug tree-optimization/63467] should have asm statement that does not prevent vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63467 --- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org --- It's the same with asm( :::); At least the vectorizer bombs out on any asm.
[Bug tree-optimization/63467] should have asm statement that does not prevent vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63467 --- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org --- For the marker case it's enough if it just stays in the same position in the basic block and does get duplicated if the BB gets too. That's somewhat special semantics, that is why I think it would need some way to annotate (asm const?) Ok maybe Andrew's trick works, but it seems fragile. Would that work for other loop transformations (like graphite) too?
[Bug libstdc++/63466] sstream is very slow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63466 --- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org --- Looking at the profile there's plenty of room for optimization. e.g. not using getc/ungetc, but directly accessing the buffer, or maybe even some kind of template specialization. With the variables pulled out it's faster, but still a lot slower than C: % time ./a.out testfile real0m0.400s user0m0.397s sys0m0.002s % time ./tstream-c testfile real0m0.033s user0m0.028s sys0m0.004s
[Bug c/61898] Variadic functions accept va_list without warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61898 --- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org --- The patch has several issues (making it currently fail bootstrap): - it warns for vfprintf too (fixed) - on i386 it gets confused between va_list * and char *, so something like char *format; char buf[100]; printf(format, buf) warns too because the underlying types are the same. Not sure about a good solution for this, need a new type attribute?
[Bug c/63450] Optimizing -O3 generates rep ret on an almost empty function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63450 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org --- This is a feature in -mtune=generic because it helps branch prediction in some older AMD CPUs. If you're optimizing for Atom you'll get even more padding due to other reasons. Optimizing e.g. for nehalem should avoid it.
[Bug c/61898] Variadic functions accept va_list without warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61898 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org --- Created attachment 33633 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33633action=edit Proposed patch This patch implements the warning for the non constant format case. Not done for passing va_list to a real format, but I assume that is rare and in most cases caught by the normal type checking. Let me know if it works.
[Bug lto/61969] [4.8/4.9/5 Regression] wrong code by LTO on i?86-linux-gnu (affecting trunk, 4.9.x, and 4.8.x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61969 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org --- I did some experiments. I can reproduce it with trunk for 32bit. The interesting part is that the printed value seems to be uninitialized on the stack and changes on every run. a valgrind run gives =23130== Use of uninitialised value of size 4 ==23130==at 0x40B102B: _itoa_word (in /lib/libc-2.18.so) ==23130==by 0x40B474A: vfprintf (in /lib/libc-2.18.so) ==23130==by 0x40BAFCE: printf (in /lib/libc-2.18.so) ==23130==by 0x40879D2: (below main) (in /lib/libc-2.18.so) ==23130== Uninitialised value was created by a stack allocation ==23130==at 0x80482F4: main (in /home/andi/Downloads/pr61969/t) ==23130== ... more warnings like this ...
[Bug lto/61969] [4.8/4.9/5 Regression] wrong code by LTO on i?86-linux-gnu (affecting trunk, 4.9.x, and 4.8.x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61969 --- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org --- The problem is when returning a struct from func_52: const struct S0 func_52 (uint32_t p_53) { const struct S0 l_55 = { 4, 40290, 10, 4 }; return l_55; } The main code stores the struct value from the stack into the global variable and eventually prints it 80482f4: 83 ec 38sub$0x38,%esp 80482f7: 0f b6 15 4c da 04 08movzbl 0x804da4c,%edx 80482fe: 8b 1d 20 d1 04 08 mov0x804d120,%ebx 8048304: 0f b6 35 70 da 04 08movzbl 0x804da70,%esi 804830b: e8 b0 0c 00 00 call 8048fc0 func_52 8048310: 0f b7 45 d2 movzwl -0x2e(%ebp),%eax 8048314: ba 01 00 00 00 mov$0x1,%edx 8048319: c7 05 20 d1 04 08 48movl $0x804da48,0x804d120 8048320: da 04 08 8048323: 66 a3 5c da 04 08 mov%ax,0x804da5c But func_52 has been completely optimized away and puts nothing onto the stack: 08048fc0 func_52: 8048fc0: f3 c3 repz ret 8048fc2: 8d b4 26 00 00 00 00lea0x0(%esi,%eiz,1),%esi 8048fc9: 8d bc 27 00 00 00 00lea0x0(%edi,%eiz,1),%edi So the value is random stack garbage.
[Bug lto/61969] [4.8/4.9/5 Regression] wrong code by LTO on i?86-linux-gnu (affecting trunk, 4.9.x, and 4.8.x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61969 --- Comment #5 from Andi Kleen andi-gcc at firstfloor dot org --- func_52 disappears during/after nrv: in 173t.nrv: ;; Function func_52 (func_52, funcdef_no=86, decl_uid=2858, cgraph_uid=54, symbol_order=1152) func_52 (uint32_t p_53) { extern const struct S0 l_55 = {.f0=4, .f1=40290, .f2=10, .f3=4} [value-expr: retval]; bb 2: return retval; } in 174t.optimized ;; Function func_52 (func_52, funcdef_no=86, decl_uid=2858, cgraph_uid=54, symbol_order=1152) func_52 (uint32_t p_53) { bb 2: return retval; }
[Bug rtl-optimization/61605] Potential optimization: Keep unclobbered argument registers live across function calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61605 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added CC||andi-gcc at firstfloor dot org, ||tom at codesourcery dot com --- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org --- This is in theory implemented in mainline with -fuse-caller-save It doesn't seem to work for me though. I also didn't see the option doing anything on a larger program.
[Bug rtl-optimization/61605] Potential optimization: Keep unclobbered argument registers live across function calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61605 --- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org --- It was supposed to be enabled with Date: Fri May 30 11:39:49 2014 + -fuse-caller-save - Enable for i386 2014-05-30 Tom de Vries t...@codesourcery.com * config/i386/i386.c (TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS): Redefine as true. * gcc.target/i386/fuse-caller-save.c: New test. * gcc.dg/ira-shrinkwrap-prep-1.c: Run with -fno-use-caller-save. * gcc.dg/ira-shrinkwrap-prep-2.c: Same. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@211078 138bc75d-0d04-0410-961f-82ee72b054a4
[Bug rtl-optimization/63384] selective scheduling on x86 takes very long
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63384 --- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org --- It loops (forever?) on this in sched2: Scheduling on fences: (uid:28;seqno:7;) Fence 28[2] has not changed Scheduling on fences: (uid:28;seqno:7;) Fence 28[2] has not changed Scheduling on fences: (uid:28;seqno:7;) Fence 28[2] has not changed Scheduling on fences: (uid:28;seqno:7;) Fence 28[2] has not changed Scheduling on fences: (uid:28;seqno:7;) Fence 28[2] has not changed Scheduling on fences: (uid:28;seqno:7;) Fence 28[2] has not changed Scheduling on fences: (uid:28;seqno:7;) Fence 28[2] has not changed Scheduling on fences: (uid:28;seqno:7;) Fence 28[2] has not changed
[Bug tree-optimization/36602] memset should be optimized into an empty CONSTRUCTOR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36602 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #9 from Andi Kleen andi-gcc at firstfloor dot org --- Any progress on fixing the test case, so that this can be finally fixed?
[Bug rtl-optimization/63384] scheduler loops on endless fence list with -fselective-scheduling2 on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63384 --- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org --- It loops forever in this loop in sel_sched_region_2 while (fences) { int min_seqno, max_seqno; ilist_t scheduled_insns = NULL; ilist_t *scheduled_insns_tailp = scheduled_insns; find_min_max_seqno (fences, min_seqno, max_seqno); schedule_on_fences (fences, max_seqno, scheduled_insns_tailp); fences = calculate_new_fences (fences, orig_max_seqno, max_time); highest_seqno_in_use = update_seqnos_and_stage (min_seqno, max_seqno, highest_seqno_in_use, scheduled_insns); } because calculate_new_fences always comes up with a list which is the same as before. In move_fence_to_fences it always goes into the else f = flist_lookup (FLIST_TAIL_HEAD (new_fences), FENCE_INSN (FLIST_FENCE (old_fences))); if (f) { merge_fences (f, old-insn, old-state, old-dc, old-tc, old-last_scheduled_insn, old-executing_insns, old-ready_ticks, old-ready_ticks_size, old-sched_next, old-cycle, old-issue_more, old-after_stall_p); } else { _list_add (tailp); FLIST_TAIL_TAILP (new_fences) = FLIST_NEXT (*tailp); So something is going wrong in flist_lookup.
[Bug middle-end/61848] [5 Regression] a previous declaration causes the section attribute to be lost
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848 --- Comment #16 from Andi Kleen andi-gcc at firstfloor dot org --- Can Alan's patch be submitted please? I always need to apply it now before compiling a kernel.
[Bug middle-end/61848] [5 Regression] a previous declaration causes the section attribute to be lost
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848 --- Comment #20 from Andi Kleen andi-gcc at firstfloor dot org --- So the only problem was the missing test case, which you supplied?
[Bug middle-end/63404] New: gcc 5 miscompiles linux block layer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63404 Bug ID: 63404 Summary: gcc 5 miscompiles linux block layer Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org When I boot a current Linux mainline kernel compiled with mainline gcc and the section fix patch applied I always get a crash at boot in the block layer. gcc version 5.0.0 20140926 (experimental) (GCC) 1.318801] EXT4-fs (sda1): write access will be enabled during recovery [1.367592] [ cut here ] [1.369061] kernel BUG at /home/andi/lsrc/linux/block/blk-flush2.c:80! [1.370910] invalid opcode: [#1] SMP I narrowed it down to one function. When only the function is compiled with gcc 4.9 the kernel boots. Attach is a test case with only the function. It doesn't quite run by itself yet, so the code has to be examined.
[Bug middle-end/63404] gcc 5 miscompiles linux block layer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63404 --- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org --- Created attachment 33607 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33607action=edit not quite yet runnable test case In the real execution blk_flush_complete_seq always ends up in the default case in the switch and crashes.
[Bug target/63404] gcc 5 miscompiles linux block layer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63404 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added Component|middle-end |target --- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org --- The switch is miscompiled and destroys the flags register in the middle of a comparison: .LVL2: .loc 1 49 0 cmpl$2, %eax#, seq je .L5 #, shrb$2, %r12b #, D.32130 BAD1 andl$1, %r12d #, D.32130 BAD2 jbe .L24#, cmpl$4, %eax#, seq je .L7 #, cmpl$8, %eax#, seq jne .L4 #, gcc 4.9 creates the same code except for BAD1/BAD2. These two JBE relies on CF/ZF being preserved, but SHR can overwrite ZF/CF, which breaks the JBE after the CMP So somehow the backend lost track of these two flag bits.
[Bug rtl-optimization/63384] ICE in moveup_expr_chached-sel_bb_head-bb_node with special options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63384 --- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org --- With a newer compiler version gcc version 5.0.0 20140926 (experimental) (GCC) the test case doesn't crash anymore, but just runs very very long. I killed it after 20s. This happens with the following two options: g++50 matrix.i -o outfile -O2 -fvar-tracking-assignments-toggle -fselective-scheduling2 The overhead is mostly in the scheduler: - sched_analyze_insn(deps_desc*, rtx_def*, rtx_insn*) ▒ - 99.39% deps_analyze_insn(deps_desc*, rtx_insn*) ▒ tick_check_p(_expr*, deps_desc*, _fence*) ▒ fill_insns(_fence*, int, _list_node***) ▒ sel_sched_region_2(int) ▒ sel_sched_region(int) ▒ run_selective_scheduling()▒ (anonymous namespace)::pass_sched2::execute(function*)▒ execute_one_pass(opt_pass*) ▒ execute_pass_list_1(opt_pass*)▒ execute_pass_list_1(opt_pass*)▒ execute_pass_list_1(opt_pass*)▒ execute_pass_list(function*, opt_pass*) ▒ cgraph_node::expand() ▒ symbol_table::compile() ▒ symbol_table::finalize_compilation_unit() ▒ cp_write_global_declarations()▒ compile_file()▒ toplev_main(int, char**) ▒ __libc_start_main ▒ + 0.61% tick_check_p(_expr*, deps_desc*, _fence*) sched_analyze_insn(deps_desc*, rtx_def*, rtx_insn*) │ │for (i = 0; i FIRST_PSEUDO_REGISTER; i++) 12.84 │ 748: add$0x1,%r13d 0.07 │add$0x30,%r14 │cmp$0x4d,%r13d │ ↓ je 7e5 │ if (TEST_HARD_REG_BIT (implicit_reg_pending_uses, i)) 0.06 │ 75a: mov%r13d,%eax 12.45 │shr$0x6,%eax 0.17 │mov0x1828100(,%rax,8),%rax 6.06 │bt %r13,%rax 6.21 │ ↑ jae748 │{
[Bug rtl-optimization/63384] selective scheduling on x86 takes very long
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63384 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added Attachment #33585|0 |1 is obsolete|| --- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org --- Created attachment 33600 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33600action=edit Reduced test case for long compile time Oddly the problem goes away when the variable allocation that is not used is commented out.
[Bug target/63382] New: gcc 5 breaks linux early bootup in QEMU
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63382 Bug ID: 63382 Summary: gcc 5 breaks linux early bootup in QEMU Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org No debug so far. But a gcc 5 compiled x86 Linux kernel cannot boot in qemu/KVM with -kernel bzImage. qemu always resets and loops directly after starting to execute the kernel image. The same kernel compiled with an older compiler works fine.
[Bug middle-end/61848] [5 Regression] a previous declaration causes the section attribute to be lost
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added CC||andi-gcc at firstfloor dot org --- Comment #15 from Andi Kleen andi-gcc at firstfloor dot org --- *** Bug 63382 has been marked as a duplicate of this bug. ***
[Bug target/63382] gcc 5 breaks linux early bootup in QEMU
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63382 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org --- Yes, Alan's patch fixes it. So it's a dup. *** This bug has been marked as a duplicate of bug 61848 ***