[Bug tree-optimization/86625] funroll-loops doesn't unroll, producing >3x assembly and running 10x slower than manual complete unrolling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86625 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org Component|rtl-optimization|tree-optimization --- Comment #1 from Alexander Monakov --- Please supply testcase(s) as Bugzilla attachments, not external links. At -O3/-Ofast the main issue is early unrolling ('cunrolli') splatting all simple 16-iteration inner loops. After that imho all hope is lost, and yeah, looks like we try to vectorize across the other dimension. With -O3 -fdisable-tree-cunrolli, or with -O2 -ftree-vectorize we do get the correct vectorization pattern, but a couple of problems remain: after vect, tree optimizations cannot hoist/sink memory references out of the outer loop, leaving 2 loads, 1 load-broadcast and 1 store per each fma. Later, RTL PRE cleans up redundant vector loads, but load-broadcasts and stores remain.
[Bug c/86617] [6/7/8/9 Regression] Volatile qualifier is ignored sometimes for unsigned char
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86617 Alexander Monakov changed: What|Removed |Added Keywords||wrong-code Status|UNCONFIRMED |NEW Last reconfirmed||2018-07-21 CC||amonakov at gcc dot gnu.org Summary|Volatile qualifier is |[6/7/8/9 Regression] |ignored sometimes for |Volatile qualifier is |unsigned char |ignored sometimes for ||unsigned char Ever confirmed|0 |1 --- Comment #1 from Alexander Monakov --- Confirmed, 'unsigned short' is similarly mishandled, but not wider integer types. gcc-4.9 got this right. Appears like over-eager folding in the frontend: in the .original dump I get { u8 = u8 * 2; u8 = u8, 0; }
[Bug c++/86586] [6/7/8/9 Regression] -Wsign-compare affects code generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86586 --- Comment #7 from Alexander Monakov --- Another possible compromise is to add 'bool for_warnings = false' argument to maybe_constant_value, store it along with the reduced tree in cv_cache (perhaps even by setting a flag on the tree itself?), and then when retrieving from cv_cache when !for_warnings, but the retrieved tree has the flag set, throw it away and recompute. That should be a fairly simple change that keeps the current speed when the warnings are disabled or main code generation needs the reduced tree before some of the warnings do.
[Bug c++/86586] New: [6/7/8/9 Regression] -Wsign-compare affects code generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86586 Bug ID: 86586 Summary: [6/7/8/9 Regression] -Wsign-compare affects code generation Product: gcc Version: 6.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Blocks: 86518 Target Milestone: --- void f () { __builtin_cpu_supports ("avx2") && __builtin_cpu_supports ("ssse3"); } ICEs with 'g++ -std=c++98 -fcompare-debug=-Wsign-compare'. This is minimized from mv1.C in the testsuite. I know it's inconvenient that this test depends on an x86-specific builtin, but unfortunately I don't see other tests failing (apart from cp/mangle.c miscomparing on bootstrap with/without the warning). This may be similar to PR 86567: there's a use of maybe_constant_value guarded by warn_sign_compare. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518 [Bug 86518] Strengthen bootstrap comparison by not enabling warnings at stage3
[Bug bootstrap/86518] Strengthen bootstrap comparison by not enabling warnings at stage3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518 --- Comment #9 from Alexander Monakov --- One more: -Wimplicit-fallthrough issue uncovered by the testsuite: PR 86575. So far all issues appeared in gcc-6 or more recent.
[Bug middle-end/86575] New: -Wimplicit-fallthrough affects code generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86575 Bug ID: 86575 Summary: -Wimplicit-fallthrough affects code generation Product: gcc Version: 7.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- void f2 (int a, int b, int c, int d) { switch (b) { default: for (int e = 0; e < c; ++e) if (e == d) break; } } ICEs as both C and C++ using 'gcc -fcompare-debug=-Wimplicit-fallthrough'. This is minimized from pr81275-1.C in the testsuite (the -2 and -3 variants of the original test also fail).
[Bug bootstrap/86518] Strengthen bootstrap comparison by not enabling warnings at stage3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518 --- Comment #8 from Alexander Monakov --- Other files seem to miscompare due to -Wnonnull-compare: PR 86569.
[Bug c++/86569] New: -Wnonnull-compare affects code generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86569 Bug ID: 86569 Summary: -Wnonnull-compare affects code generation Product: gcc Version: 6.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- bool b; int main () { return ((!b) != 0); } ICEs with g++ -fcompare-debug=-Wnonnull-compare (this is bool6.C in the testsuite). It looks as if the warning prevents folding '!b != 0' to '!b'.
[Bug bootstrap/86518] Strengthen bootstrap comparison by not enabling warnings at stage3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518 --- Comment #7 from Alexander Monakov --- cp/mangle.o miscompares due to -Wsign-compare, possibly due to caching in maybe_constant_value as in the above PR.
[Bug bootstrap/86518] Strengthen bootstrap comparison by not enabling warnings at stage3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518 --- Comment #6 from Alexander Monakov --- GCC 7 sadly has a similar list of miscomparing files. Did not check GCC 6 yet. So far I managed to catch one set of misbehaving warnings by checking testsuite fallout with -fcompare-debug=-Wall, but unfortunately fixing those would not reduce the number of bootstrap miscompares: PR 86567.
[Bug c++/86567] New: [8/9 Regression] -Wnonnull/-Wformat/-Wrestrict affect code generation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86567 Bug ID: 86567 Summary: [8/9 Regression] -Wnonnull/-Wformat/-Wrestrict affect code generation Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- #include std::vector f() { std::vector r; return r; } starting with gcc-8 ICEs using 'g++ -fcompare-debug=-Wnonnull' (as well as Wformat, Wrestrict, Wsuggest-attribute=format) cp/call.c:build_over_call() has: if (warn_nonnull || warn_format || warn_suggest_attribute_format || warn_restrict) { tree *fargs = (!nargs ? argarray : (tree *) alloca (nargs * sizeof (tree))); for (j = 0; j < nargs; j++) { /* For -Wformat undo the implicit passing by hidden reference done by convert_arg_to_ellipsis. */ if (TREE_CODE (argarray[j]) == ADDR_EXPR && TYPE_REF_P (TREE_TYPE (argarray[j]))) fargs[j] = TREE_OPERAND (argarray[j], 0); else fargs[j] = maybe_constant_value (argarray[j]); } warned_p = check_function_arguments (input_location, fn, TREE_TYPE (fn), nargs, fargs, NULL); } which if bypassed does not cause the ICE, which indicates that something in the snippet may affect code generation (not investigating further).
[Bug bootstrap/86518] Strengthen bootstrap comparison by not enabling warnings at stage3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518 --- Comment #4 from Alexander Monakov --- Yep, that's correct: -Wno-narrowing is necessary for build to succeed at all.
[Bug bootstrap/86518] New: Strengthen bootstrap comparison by not enabling warnings at stage3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518 Bug ID: 86518 Summary: Strengthen bootstrap comparison by not enabling warnings at stage3 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Currently stage2 and 3 use the same warning options, but that is redundant: if any warnings are generated, they will be present at stage2 (and stop bootstrap). By not enabling any warnings for stage3, we would get checking that warnings do not affect code generation. Note that simply adding -w at stage3 doesn't work, as it simply suppresses the warning at print time. I tried leaving only -Wno-narrowing in warning flags and got many comparison failures: Comparing stages 2 and 3 warning: gcc/cc1obj-checksum.o differs Bootstrap comparison failure! gcc/calls.o differs gcc/dwarf2out.o differs gcc/loop-iv.o differs gcc/generic-match.o differs gcc/ipa-inline.o differs gcc/builtins.o differs gcc/optabs.o differs gcc/tree-vrp.o differs gcc/profile.o differs gcc/i386.o differs gcc/cfgexpand.o differs gcc/simplify-rtx.o differs gcc/gimple-ssa-sprintf.o differs gcc/expr.o differs gcc/print-tree.o differs gcc/gimple-match.o differs gcc/godump.o differs gcc/gimple-ssa-nonnull-compare.o differs gcc/targhooks.o differs gcc/tree-ssa-live.o differs gcc/gimple-ssa-warn-restrict.o differs gcc/tree-ssa-ccp.o differs gcc/gimplify.o differs gcc/tree-cfg.o differs gcc/tree-pretty-print.o differs make: *** [compare] Error 1
[Bug lto/86490] lto1: fatal error: multiple prevailing defs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86490 --- Comment #8 from Alexander Monakov --- (In reply to H.J. Lu from comment #7) > It is to be consistent for common symbol linked against .a or .so. That seems like a really strange reason because without --whole-archive there are other ways to arrive at an apparent "inconsistency", while with --whole-archive there's no need for special treatment as the "consistent" result is achieved automatically.
[Bug lto/86490] lto1: fatal error: multiple prevailing defs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86490 --- Comment #6 from Alexander Monakov --- (In reply to H.J. Lu from comment #5) > When ld sees a common symbol, it will use a non-common definiton > in a library, .a or .so, to override it. This is surprising, is it documented somewhere? I don't think the ELF spec suggests something like that needs to happen. > Do you have a testcase? No, it would take some time to prepare.
[Bug lto/86490] lto1: fatal error: multiple prevailing defs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86490 --- Comment #4 from Alexander Monakov --- (In reply to H.J. Lu from comment #3) > It is because gold doesn't check archive for a common definition. Please elaborate - does ld.bfd try to extract static archive members when it already has a common definition? Why? > Is there a common symbol involved? I don't think so, but I'm not sure. We've also seen other pain points like the same member extracted and given to the plugin multiple times, even though the second extraction cannot possibly satisfy any unresolved references.
[Bug lto/86490] lto1: fatal error: multiple prevailing defs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86490 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #2 from Alexander Monakov --- Note that Gold does not exhibit this issue. I think ld.bfd is at fault here. We've hit similar issues with some internal plugin development. The main issue is, ld.bfd feeds the plugin with objects extracted from static archives, but those objects do not satisfy any unresolved references and would not be extracted in the first place in non-LTO link. So ld.bfd is causing useless extra work both for itself and the compiler plugin. It would be nice to fix this on ld.bfd side so future plugin writers don't need to wrestle with this issue.
[Bug lto/86442] Wrong error: global register variable follows a function definition when using LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86442 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #2 from Alexander Monakov --- Indeed, I think the build system should be passing -fno-lto for such translation units, as their ABI is different from the rest. I'm not sure limiting the inliner is enough, there's also IPA-RA and it's not obviously safe w.r.t global-reg-var differences. If there's a desire to support such usage "seamlessly", I'd really like to see the same solution for this and toplevel asms: making such input translation unit one LTO partition (i.e. not splitting or merging it with anything else).
[Bug tree-optimization/86435] -fsemantic-interposition does not appear to have any effect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86435 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||amonakov at gcc dot gnu.org Resolution|--- |INVALID --- Comment #1 from Alexander Monakov --- Without -fpic, f1 is considered not interposable. With -fpic, gcc needs -fsemantic-interposition to optimize f2 to 'return 0;'.
[Bug c/86420] [9 regression] nextafter(0x1p-1022,0) is constant folded
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86420 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org, ||jakub at gcc dot gnu.org --- Comment #2 from Alexander Monakov --- I think it's intended for -ftrapping-math to cover this. Jakub's patch adding this folding functionality handles over/underflow cases, but looks like the situation in comment #0 is not handled: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01027.html
[Bug tree-optimization/86214] [8/9 Regression] Strongly increased stack usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86214 Alexander Monakov changed: What|Removed |Added Status|WAITING |NEW --- Comment #8 from Alexander Monakov --- Removing the 'waiting' status.
[Bug tree-optimization/86214] [8/9 Regression] Strongly increased stack usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86214 --- Comment #5 from Alexander Monakov --- Sorry, this still seems over-reduced: the 'cmp' variable is uninitialized, and gcc-7 completely optimizes out the block with large stack usage guarded by 'cmp == 0' test, so gcc-7 vs gcc-8 is not directly comparable. It's strange that gcc-7 optimizes that out, but it's a different issue. Can you attach the unreduced preprocessed source, and if you make another attempt at reducing, perhaps enable most warnings? That said, it seems gcc is not very good at re-discovering non-overlapping stack allocations introduced by inlining. Looking at your testcase I came up with the following minimal test: struct S{~S();}; void f(void *); inline void ff() { char c[1000]; f(c); } void g(int n) { S s; char c[100]; f(c); if (n) ff(), ff(); } (there's no regression vs. gcc-7 on this example, but gcc-4.6 used to get a better result by consuming 1100 bytes rather than 2100).
[Bug driver/86388] Enhancement: sort "valid arguments to '-march=' switch" suggestions alphabetically
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86388 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- I'd prefer existing ordering relative to alphabetical: the list produced by GCC is mostly ordered first by manufacturer, then by generation/capabilities. Placement of the 'x86-64' entry seems odd, but, other than that, I like the order. The manual also orders -march entries in this "manufacturer-capabilities" style.
[Bug fortran/86350] Missed optimization with multiplication by zero
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86350 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #2 from Alexander Monakov --- The multiplication is optimized out under -ffinite-math-only -fno-signed-zeros (otherwise y can be NaN if bar returns infinity, for example). Why is it ok to optimize out the call to bar even though it's impure?
[Bug middle-end/86311] gcc_qsort calls memcpy with overlaps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86311 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Alexander Monakov --- Thanks! Fixed by using memmove where memcpy might have been called with dst==src.
[Bug middle-end/86311] gcc_qsort calls memcpy with overlaps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86311 --- Comment #1 from Alexander Monakov --- Author: amonakov Date: Mon Jun 25 17:44:15 2018 New Revision: 262092 URL: https://gcc.gnu.org/viewcvs?rev=262092=gcc=rev Log: gcc_qsort: avoid overlapping memcpy (PR 86311) PR middle-end/86311 * sort.cc (REORDER_23): Avoid memcpy with same destination and source. (REORDER_45): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/sort.cc
[Bug tree-optimization/86214] [8/9 Regression] Strongly increased stack usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86214 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Target Milestone|--- |8.2 Summary|[8 Regression] Strongly |[8/9 Regression] Strongly |increased stack usage |increased stack usage Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2018-06-21 CC||amonakov at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Alexander Monakov --- Inlining decisions are not so different between 7/8, the main difference is gcc-8 translates b::x() into __builtin_unreachable and warns accordingly: warning: no return statement in function returning non-void [-Wreturn-type] b x(b) {} ^ and with that change gcc-8 no longer manages to prove that big arrays have non-overlapping lifetimes. If I change the source to well-formed 'void x(b) {}', it compiles as desired. So, assuming the original MySQL source is free of that warning, the testcase is too aggressively reduced and no longer reflects the original issue. Can you please re-reduce?
[Bug lto/86175] LTO code generator does not respect ld -u option to force symbol inclusion in the link product
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86175 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #3 from Alexander Monakov --- > but really gets an empty blob from the LTO plugin for foo. Are you sure about this? Compiling with -save-temps shows that the symbol is present in GCC's assembly output; specifying --print-gc-sections also shows that the linker is discarding it: /usr/bin/ld.bfd: Removing unused section '.text.KeepMe' in file '/tmp/ccWbtSKK.ltrans0.ltrans.o' Gold linker does not exhibit this (try -fuse-ld=gold). Can you report it against the BFD linker at sourceware.org/bugzilla?
[Bug c/86174] Poor vectorization/register allocation with omp simd, FMA
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86174 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- It might be useful to note that what the testcase "wants" to happen is for the compiler to notice that the temporary array 'double C[Si][Sk]' does not need to live in memory - ideally it would correspond to 8 256-bit (or 4 512-bit) registers.
[Bug c/86150] Trunk Segmentation Fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86150 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||amonakov at gcc dot gnu.org Resolution|--- |INVALID --- Comment #1 from Alexander Monakov --- This is the *assembler* segfaulting, not the *compiler*. The assembly produced by trunk is not different from gcc-8 output on empty input, so it's probably some weird issue with Binutils installation for gcc-trunk worker(s) on Godbolt side.
[Bug c++/86094] [8/9 Regression] Call ABI changed for small objects with defaulted ctor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86094 --- Comment #3 from Alexander Monakov --- -fabi-version=12 is not documented, not mentioned in release notes, and not wired up in -Wabi.
[Bug rtl-optimization/86096] ICE: qsort checking failed (error: qsort comparator non-negative on sorted output: 0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86096 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-06-09 CC||amonakov at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Alexander Monakov --- df_mw_compare has: if (mw1->mw_reg != mw2->mw_reg) return mw1->mw_order - mw2->mw_order; Note mw_reg in the 'if' vs mw_order in the 'return'. This is invalid. It's simpler and more efficient to just use mw_order as the last tie-breaker regardless of mw_reg value.
[Bug c++/86094] New: [8/9 Regression] Call ABI changed for small objects with defaulted ctor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86094 Bug ID: 86094 Summary: [8/9 Regression] Call ABI changed for small objects with defaulted ctor Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: ABI, wrong-code Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- When compiling the following with -O2 -std=c++11: struct S { S(S&&) = default; int i; }; S foo(S s) { return s; } gcc-7 and earlier emit _Z3foo1S: movl%edi, %eax ret but gcc-8 and trunk emit _Z3foo1S: movl(%rsi), %edx movq%rdi, %rax movl%edx, (%rdi) ret i.e. the object is now passed in memory rather than on register. This appears to be a silent ABI change. (Clang generates the same code as gcc-7)
[Bug c/86093] [8/9 Regression] volatile ignored on pointer in C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86093 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-06-08 CC||amonakov at gcc dot gnu.org Known to work||7.3.0 Summary|volatile ignored on pointer |[8/9 Regression] volatile |in C|ignored on pointer in C Ever confirmed|0 |1 Known to fail||8.1.0, 9.0 --- Comment #1 from Alexander Monakov --- gcc-7 got this right.
[Bug tree-optimization/86072] Poor codegen with atomics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86072 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #3 from Alexander Monakov --- As for the segfault mentioned in comment 0, this is not a compiler bug: it's the assembler segfaulting, and it segfaults even with an empty source, so it's probably an issue/misconfiguration on the godbolt.org side.
[Bug tree-optimization/86071] -O0 -foptimize-sibling-calls doesn't optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86071 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||amonakov at gcc dot gnu.org Resolution|--- |INVALID --- Comment #1 from Alexander Monakov --- In GCC there's no way to selectively enable a few optimizations with their -f flags at -O0 level: -O0 means that optimizations are completely disabled, regardless of -f flags. This is mentioned in the manual: "Most optimizations are only enabled if an -O level is set on the command line. Otherwise they are disabled, even if individual optimization flags are specified." Tail call optimization sometimes is not applied because there's an escaping local variable (possibly from an inlined function), and GCC does not take into account its life range. This might be what you're seeing at -O3. There's a recent report: PR 86050.
[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #16 from Alexander Monakov --- What do you think about the suggestion made in the most recent duplicate, namely expanding GIMPLE pointer-to-integer casts to non-transparent RTL assignments, i.e. going from val = (intptr_t) ptr; to asm ("" : "=g" (rval) : "0" (rptr)); Wouldn't this plug the hole in one shot instead of chasing down missing REG_POINTERs in multiple RTL passes?
[Bug c/86026] Document and/or change allowed operations on integer representation of pointers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86026 --- Comment #3 from Alexander Monakov --- Tree optimizations already manage to avoid "optimizing" f_intadd, but unfortunately on RTL types and casts are not visible in IR and various passes make no distinction between (char*)((uintptr_t)t + o) and (t + o). Perhaps GCC should consider lowering pointer-to-integer casts to a non-transparent assignment, making the result alias all for the purposes of RTL alias analysis, akin to char __attribute__ ((noinline)) f_intadd1(ptrdiff_t o) { g = 1; uintptr_t t1 = (uintptr_t)t; asm("" : "+g"(t1)); *(char*)(t1 + o) = 2; return g; }
[Bug c/86026] Document and/or change allowed operations on integer representation of pointers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86026 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- Please add full testcase source, the snippet is missing (at least) declarations of 'g' and 't'. The Godbolt link does not work correctly for me right now, and in general such links are not reliable long-term.
[Bug target/85994] Comparison failure in 64-bit libgcc *_{sav,res}ms64*.o on Solaris/x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85994 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #2 from Alexander Monakov --- Why does this affect only new files, i.e. how did existing libgcc .S files avoid running into the same issue?
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #10 from Alexander Monakov --- Also note that both the original and the reduced testcase can be tweaked to exhibit the surprising transformation even when -fexcess-precision=standard is enabled. A "lazy" way is via -mpc64, but I think it's possible even without the additional option (by making the code more convoluted to enforce rounding to double). Here's what happens on the reduced testcase: $ gcc -m32 d.c -O -fdisable-tree-dom3 && ./a.out cc1: note: disable pass tree-dom3 for functions in the range of [0, 4294967295] 1 == 0 $ gcc -m32 d.c -O -fdisable-tree-dom3 -fexcess-precision=standard -mpc64 && ./a.out cc1: note: disable pass tree-dom3 for functions in the range of [0, 4294967295] 0 == 1
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #9 from Alexander Monakov --- Sorry, the above comment should have said 'b * 1e6' every time it said 'b'.
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #8 from Alexander Monakov --- To expand a bit: DOM makes the small testcase behave as if 'b' and 'ib' are evaluated twice: * one time, 'b' is evaluated in precision matching 'a' (either infinite or double), and 'ib' is evaluated to 1; this instance is used in 'ia == ib' comparison; * a second time, 'b' is evaluated in extended precision and 'ib' is evaluated to 0; this instance is passed as the last argument to printf. This is surprising as the original program clearly evaluates 'b' and 'ib' just once. If there's no bug in DOM and the observed transformation is allowed to happen when -fexcess-precision=fast is in effect, I think it would be nice to mention that in the compiler manual.
[Bug target/85961] scratch register rsi used after function call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85961 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #3 from Alexander Monakov --- You'd need to disable IPA-RA after forcing -O2 with the pragma, i.e.: #pragma GCC optimize "O2" #pragma GCC optimize "no-ipa-ra" We already have logic to disable IPA-RA when instrumentation/profiling is active, but it's done once in toplev.c. Here the pragma re-enables IPA-RA after toplev.c:process_options() has disabled it. Do we want to adjust it given that "pragma optimized" is documented as "not suitable for production use"?
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 Alexander Monakov changed: What|Removed |Added Status|RESOLVED|REOPENED Last reconfirmed||2018-05-29 CC||amonakov at gcc dot gnu.org Resolution|DUPLICATE |--- Ever confirmed|0 |1 --- Comment #7 from Alexander Monakov --- Reopening, the issue here is way more subtle than bug 323 and points to a possible issue in DOM. Hopefully Richi can have a look and comment. It appears dom2 pass performs something like jump threading based on compile-time-evaluated floating-point expression values without also substituting those expressions in IR. At run time, they are evaluated to different values, leading to an inconsistency. Namely, dom2 creates bb 10: : # iftmp.1_1 = PHI <"true"(7), "false"(8), "true"(10)> printf ("(a6 == b6) = %s\n", iftmp.1_1); return 0; : _24 = __n2_13 * 1.0e+6; b6_25 = (guint64) _24; printf ("a6 = %llu\n", 1); printf ("b6 = %llu\n", b6_25); goto ; where jump to bb 9 implies that _24 evaluates to 1.0 and b6_25 to 1, but they are not substituted as such, and at run time evaluate to 0.99... and 0 due to excess precision. The following reduced testcase demonstrates the same issue, but requires -fdisable-tree-dom3 (on gcc-6 at least, as otherwise dom3 substitutes results of compile-time evaluation). __attribute__((noinline,noclone)) static double f(void) { return 1e-6; } int main(void) { double a = 1e-6, b = f(); if (a != b) __builtin_printf("uneq"); unsigned long long ia = a * 1e6, ib = b * 1e6; __builtin_printf("%lld %s %lld\n", ia, ia == ib ? "==" : "!=", ib); }
[Bug bootstrap/85921] /gcc/c-family/c-warn.c fails to build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85921 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #5 from Alexander Monakov --- Glibc bits/sigcontext.h should not include Linux asm/sigcontext.h (but it used to on i386). This was fixed back in 2012 for Glibc 2.16 by this Glibc commit: https://sourceware.org/git/?p=glibc.git;a=commit;h=48495318fa5ae223a8b777ed144bd769d9f6c67f I doubt this warrants a change on GCC side, given that a workaround is simple.
[Bug rtl-optimization/85099] [meta-bug] selective scheduling issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099 Bug 85099 depends on bug 79985, which changed state. Bug 79985 Summary: ICE in code_motion_path_driver, at sel-sched.c:6580 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985 Alexander Monakov changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #10 from Alexander Monakov --- Fixed.
[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985 --- Comment #9 from Alexander Monakov --- Author: amonakov Date: Wed May 23 15:01:28 2018 New Revision: 260613 URL: https://gcc.gnu.org/viewcvs?rev=260613=gcc=rev Log: df-scan: remove ad-hoc handling of global regs in asms PR rtl-optimization/79985 * df-scan.c (df_insn_refs_collect): Remove special case for global registers and asm statements. testsuite/ * gcc.dg/pr79985.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/pr79985.c Modified: trunk/gcc/ChangeLog trunk/gcc/df-scan.c trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/80318] GCC takes too much RAM and time compiling a template file (var-tracking)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80318 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #5 from Alexander Monakov --- Second largest seems to be the frontend, as with -fsyntax-only we still need 18s and 1.8GB (this is 8.1 with release checking): Time variable usr sys wall GGC phase setup: 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1381 kB ( 0%) phase parsing : 4.01 ( 22%) 0.80 ( 30%) 4.82 ( 23%) 519422 kB ( 27%) phase lang. deferred : 13.96 ( 78%) 1.83 ( 70%) 15.82 ( 77%) 1414614 kB ( 73%) |name lookup : 1.89 ( 11%) 0.35 ( 13%) 2.08 ( 10%) 99986 kB ( 5%) |overload resolution : 8.94 ( 50%) 1.29 ( 49%) 10.10 ( 49%) 934750 kB ( 48%) garbage collection : 1.79 ( 10%) 0.00 ( 0%) 1.80 ( 9%) 0 kB ( 0%) preprocessing : 0.14 ( 1%) 0.12 ( 5%) 0.37 ( 2%) 2890 kB ( 0%) parser (global): 0.58 ( 3%) 0.21 ( 8%) 0.73 ( 4%) 115783 kB ( 6%) parser struct body : 0.74 ( 4%) 0.09 ( 3%) 0.77 ( 4%) 81383 kB ( 4%) parser enumerator list : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 364 kB ( 0%) parser function body : 0.05 ( 0%) 0.03 ( 1%) 0.07 ( 0%) 4688 kB ( 0%) parser inl. func. body : 0.10 ( 1%) 0.01 ( 0%) 0.12 ( 1%) 6402 kB ( 0%) parser inl. meth. body : 0.41 ( 2%) 0.06 ( 2%) 0.39 ( 2%) 27538 kB ( 1%) template instantiation : 13.86 ( 77%) 2.06 ( 78%) 16.02 ( 78%) 1694216 kB ( 88%) constant expression evaluation : 0.26 ( 1%) 0.04 ( 2%) 0.27 ( 1%) 729 kB ( 0%) varconst : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 39 kB ( 0%) symout : 0.02 ( 0%) 0.01 ( 0%) 0.07 ( 0%) 0 kB ( 0%) TOTAL : 17.97 2.63 20.65 1935427 kB
[Bug c++/85783] alloc-size-larger-than fires incorrectly with new[] and can't be disabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85783 Alexander Monakov changed: What|Removed |Added Status|RESOLVED|REOPENED Last reconfirmed||2018-05-16 CC||amonakov at gcc dot gnu.org Resolution|WONTFIX |--- Ever confirmed|0 |1 --- Comment #10 from Alexander Monakov --- Reopening: the request to be able to disable the warning (via -Wno-alloc-size-larger-than) is valid and should be addressed.
[Bug target/41084] Filling xmm register with all bit set is not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41084 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||amonakov at gcc dot gnu.org Resolution|--- |FIXED --- Comment #2 from Alexander Monakov --- Starting from gcc-4.5 (released in 2010) GCC emits pcmpeq for the explicit-constructor variant (where it would previously emit a load) as well as for a more concise form: __m128i r = {-1, -1}; The implicit variant with _mm_cmpeq_epi32 is optimized as expected starting with gcc-5 (released in 2015). So as far as I can see both issues raised in this report have been addressed in the meantime. If there are other cases that are not well optimized, please let us know (they deserve separate bug reports).
[Bug tree-optimization/85758] New: questionable bitwise folding (missing single use check?)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85758 Bug ID: 85758 Summary: questionable bitwise folding (missing single use check?) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- The following should be translated as-is: void f(int a, int b); void g(int a, int b, int m, int s) { m &= s; a += m; m ^= s; b += m; f(a, b); } However instead of and/add/xor/add we get mov/not/and/and/add/add: movl%edx, %eax notl%edx andl%ecx, %eax andl%edx, %ecx addl%eax, %edi addl%ecx, %esi jmp f This is because forwprop applies an identity to m = (m & s) ^ s: g (int a, int b, int m, int s) { : m_3 = m_1(D) & s_2(D); a_5 = a_4(D) + m_3; m_6 = m_3 ^ s_2(D); b_8 = b_7(D) + m_6; f (a_5, b_8); return; } gimple_simplified to _11 = ~m_1(D); m_6 = s_2(D) & _11; g (int a, int b, int m, int s) { int _11; : m_3 = m_1(D) & s_2(D); a_5 = m_3 + a_4(D); _11 = ~m_1(D); m_6 = s_2(D) & _11; b_8 = m_6 + b_7(D); f (a_5, b_8); return; } However since m_3 is used, this is more costly. Shouldn't this folding check for single use of the intermediate expr? From a quick look, this is probably match.pd:/* Fold (X & Y) ^ Y and (X ^ Y) & Y as ~X & Y. */
[Bug tree-optimization/85757] New: tree optimizers fail to fully clean up fixed-size memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85757 Bug ID: 85757 Summary: tree optimizers fail to fully clean up fixed-size memcpy Product: gcc Version: unknown Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- This is minimized from one of suboptimal stack consumption issues in gcc_qsort. gcc_qsort uses code similar to this to move potentially-unaligned data: void f(int n, char *p0, char *p1, char *p2, char *o) { int t0, t1; __builtin_memcpy(, p0, 1); __builtin_memcpy(, p1, 1); if (n==3) __builtin_memcpy(o+2, p2, 1); __builtin_memcpy(o+0, , 1); __builtin_memcpy(o+1, , 1); } Note the mismatch between memcpy size (1) and temporaries' size (4). If the sizes match, there's no problem. If not, tree optimizers fail to fully clean up the copies (and, unlike in this minimal testcase, in full gcc_qsort RTL optimizers can't clean it up either and we get dead stack stores). The .optimized dump reads (note dead writes to t0 and t1 in BB 2): f (int n, char * p0, char * p1, char * p2, char * o) { int t1; int t0; unsigned char _4; unsigned char _7; unsigned char _12; [local count: 1073741825]: _4 = MEM[(char * {ref-all})p0_3(D)]; MEM[(char * {ref-all})] = _4; _7 = MEM[(char * {ref-all})p1_6(D)]; MEM[(char * {ref-all})] = _7; if (n_9(D) == 3) goto ; [34.00%] else goto ; [66.00%] [local count: 365072220]: _12 = MEM[(char * {ref-all})p2_11(D)]; MEM[(char * {ref-all})o_10(D) + 2B] = _12; [local count: 1073741825]: MEM[(char * {ref-all})o_10(D)] = _4; MEM[(char * {ref-all})o_10(D) + 1B] = _7; t0 ={v} {CLOBBER}; t1 ={v} {CLOBBER}; return; }
[Bug target/85683] [8 Regression] GCC 8 stopped using RMW (Read Modify Write) instructions on x86[_64]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85683 Alexander Monakov changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2018-05-07 CC||amonakov at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Alexander Monakov --- Smaller testcase: void f(void); void g(int *p) { if (!--*p) f(); } On gcc-7.3 this is optimized by the peephole2 pass so it doesn't really help with register pressure (combine pass seems more suitable for that); don't know why the peephole doesn't trigger on gcc-8.
[Bug rtl-optimization/85673] ICE in create_pre_exit, at mode-switching.c:451
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85673 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2018-05-06 CC||abel at gcc dot gnu.org, ||amonakov at gcc dot gnu.org Blocks||84301 Assignee|unassigned at gcc dot gnu.org |amonakov at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Alexander Monakov --- PR 84301 is related (not backported to 6/7 so failure is expected there). The fix was incomplete because 'cant_move' insn flag only restricts inter-block motion (argh!), so sel-sched is still free to move %eax assignment up. Oops. Perhaps we can additionally set sched_group_p in add_branch_dependences for pre-RA sel-sched to ensure insns stay at the end of basic block; after reload that would also pin mutex_p cond-exec insns to BB end as well. (apropos: flag_sched_group_heuristic should be removed, the way it's used in rank_for_schedule is not a heuristic, but a correctness requirement) Overall I'm concerned that mode-switching is making unreasonable assumptions, if it really needs that some insns stay in sequence just before function return, they should be arranged to have a barrier insn or SCHED_GROUP_P from the beginning. So maybe it's better to adjust mode-switching instead, but unfortunately it's not quite obvious how it works :) Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84301 [Bug 84301] [6/7 Regression] ICE in create_pre_exit, at mode-switching.c:451
[Bug rtl-optimization/84842] [7/8/9 Regression] ICE in verify_target_availability, at sel-sched.c:1569
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842 --- Comment #14 from Alexander Monakov --- Thanks. I think the root cause on this x86_64 testcase is different. Arseny, in the meantime if by chance you have another x86_64 variant of this failure that doesn't require -funroll-all-loops, please post it as well.
[Bug inline-asm/85546] GCC assumes volatile asm block returns same value in loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85546 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||amonakov at gcc dot gnu.org Resolution|--- |INVALID --- Comment #3 from Alexander Monakov --- I'm not sure Richard is correct about the definition of volatile asms: similar to reads of volatile objects, volatile asms can produce different output on each invocation (iow they are not pure/const). In any case the inline asm in io() is missing clobbers for rcx, r11 and memory, which makes the bug invalid.
[Bug rtl-optimization/85099] [meta-bug] selective scheduling issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099 Bug 85099 depends on bug 85423, which changed state. Bug 85423 Summary: [8 Regression] ICE in code_motion_process_successors, at sel-sched.c:6403 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85423 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/80463] [6/7 Regression] ICE with -fselective-scheduling2 and -fvar-tracking-assignments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80463 Bug 80463 depends on bug 85423, which changed state. Bug 85423 Summary: [8 Regression] ICE in code_motion_process_successors, at sel-sched.c:6403 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85423 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/85423] [8 Regression] ICE in code_motion_process_successors, at sel-sched.c:6403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85423 Alexander Monakov changed: What|Removed |Added Status|NEW |RESOLVED Blocks||80463 Resolution|--- |FIXED --- Comment #7 from Alexander Monakov --- Thanks. I've added one more "Blocks" edge so indicate that this should be taken when backporting the earlier patch. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80463 [Bug 80463] [6/7 Regression] ICE with -fselective-scheduling2 and -fvar-tracking-assignments
[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985 --- Comment #8 from Alexander Monakov --- Unfortunately the above doesn't fully address the issue, as schedulers and other passes still have no idea that DF makes those assumptions and will allow reordering of asms: register int r asm("ebx"); int f(int x, int y) { int t = x/y/r; asm("#asm" ); return t-x; } _Z1fii: #APP #asm #NO_APP movl%edi, %eax cltd idivl %esi cltd idivl %ebx subl%edi, %eax ret See how the asm is first, even though from DF point of view it should remain after the read of %ebx for division by r; here cprop_hardreg makes the offending propagation. So currently GCC has a rather split personality when it comes to deps w.r.t global reg vars in asm statements. The documentation should spell out the intended behavior. My suggestion is to require that references are exposed to the compiler via constraints, allowing to remove the ad-hoc treatment in DF. I intend to do that early in stage 1.
[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985 --- Comment #7 from Alexander Monakov --- Or rather like this: diff --git a/gcc/df-scan.c b/gcc/df-scan.c index 95e1e0df2d5..732705c0385 100644 --- a/gcc/df-scan.c +++ b/gcc/df-scan.c @@ -3207,11 +3207,11 @@ df_insn_refs_collect (struct df_collection_rec *collection_rec, if (CALL_P (insn_info->insn)) df_get_call_refs (collection_rec, bb, insn_info, flags); - if (asm_noperands (PATTERN (insn_info->insn)) >= 0) + if (GET_CODE (PATTERN (insn_info->insn)) == ASM_INPUT) for (unsigned i = 0; i < FIRST_PSEUDO_REGISTER; i++) if (global_regs[i]) { - /* As with calls, asm statements reference all global regs. */ + /* As with calls, basic asms reference all global regs. */ df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i], NULL, bb, insn_info, DF_REF_REG_USE, flags); df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
[Bug rtl-optimization/84842] [7/8 Regression] ICE in verify_target_availability, at sel-sched.c:1569
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-04-17 Ever confirmed|0 |1 --- Comment #11 from Alexander Monakov --- Thanks, I managed to reproduce it. The unusual thing here is hardreg 63 being considered call-clobbered in its reg_raw_mode=TImode but not narrower modes. We have (insn 97 29 98 4 (set (reg:DI 63 31 [160]) (unspec:DI [ (reg:SI 29 29) ] UNSPEC_LFIWAX)) "pr84842.i":5 344 {lfiwax} (expr_list:REG_DEAD (reg:SI 29 29) (nil))) and sched-deps noting a REG_DEP_OUTPUT dependence on regno 63 against a preceding call insn according to rs6000_hard_regno_call_part_clobbered (regno=63, mode=E_TImode). I assume what the backend in conveying there is that only the low part of the register will be preserved by callees. However, when we move up the instruction we don't have a dependence. The LHS is DImode, so that seems correct as well: sched-deps had a more conservative answer because its dependence lists are not separated per mode. Andrey, does the above make sense? Can the assert be relaxed?
[Bug tree-optimization/85416] Massive performance regression when switching on "-march=native"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85416 Alexander Monakov changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #14 from Alexander Monakov --- Ah, the linked report actually says very clearly that fixes landed in Glibc 2.25, so I'll close this bug: nothing to do on GCC side about this.
[Bug tree-optimization/85416] Massive performance regression when switching on "-march=native"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85416 --- Comment #13 from Alexander Monakov --- This is most likely a variant of https://bugzilla.redhat.com/show_bug.cgi?id=1421121 so hitting this bug requires a specific CPU model. It looks as if SSE-AVX transition penalties appear when switching between pure-SSE sinf code and VEX-prefixed SSE code in the main program after the ld.so runtime resolver affects AVX state tracking in the CPU. I'm not sure if any patches have landed on Glibc side to avoid this, but in any case this should be re-reported against Glibc if needed, GCC cannot improve the situation. An easy workaround would be to pass -Wl,-z,now when linking.
[Bug tree-optimization/85416] Massive performance regression when switching on "-march=native"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85416 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #8 from Alexander Monakov --- Can you also run the tests under 'perf stat'?
[Bug rtl-optimization/84842] ICE in verify_target_availability, at sel-sched.c:1569
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842 --- Comment #8 from Alexander Monakov --- Or as Jakub (thanks!) noted on IRC, gcc/auto-host.h from the build tree may be also helpful and simpler for us to work with.
[Bug rtl-optimization/84842] ICE in verify_target_availability, at sel-sched.c:1569
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842 --- Comment #7 from Alexander Monakov --- The testcase is not easily reproducible because the rs6000 backend has some implicit dependencies on capabilities of configure-time binutils, and they are not visible as 'gcc -v' flags. So, to reproduce this we need to know the version and configure flags of cross binutils that were found and checked by gcc's configure.
[Bug rtl-optimization/84842] ICE in verify_target_availability, at sel-sched.c:1569
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #4 from Alexander Monakov --- Can you please share tree and rtl dumps for the nice testcase in comment #3 by re-running it with -fdump-tree-all -fdump-rtl-all and attaching a tar.gz with those? I could not reproduce it either, so having the dumps might help us see what's different on our side. (and an additional archive for a non-failing run without -fselective-scheduling2 might be helpful too)
[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985 Alexander Monakov changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |amonakov at gcc dot gnu.org --- Comment #6 from Alexander Monakov --- Candidate patch for gcc-9 stage 1: diff --git a/gcc/df-scan.c b/gcc/df-scan.c index 95e1e0df2d5..4708fc328c6 100644 --- a/gcc/df-scan.c +++ b/gcc/df-scan.c @@ -3207,7 +3207,8 @@ df_insn_refs_collect (struct df_collection_rec *collection_rec, if (CALL_P (insn_info->insn)) df_get_call_refs (collection_rec, bb, insn_info, flags); - if (asm_noperands (PATTERN (insn_info->insn)) >= 0) + if (asm_noperands (PATTERN (insn_info->insn)) >= 0 + && volatile_insn_p (PATTERN (insn_info->insn))) for (unsigned i = 0; i < FIRST_PSEUDO_REGISTER; i++) if (global_regs[i]) {
[Bug rtl-optimization/84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in bb_note) w/ selective scheduling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659 Bug 84659 depends on bug 85354, which changed state. Bug 85354 Summary: [8 regression] ICE with gcc.dg/graphite/pr84872.c starting with r259313 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/85354] [8 regression] ICE with gcc.dg/graphite/pr84872.c starting with r259313
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354 Alexander Monakov changed: What|Removed |Added Status|NEW |RESOLVED Blocks||84659 Resolution|--- |FIXED --- Comment #5 from Alexander Monakov --- Fixed. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659 [Bug 84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in bb_note) w/ selective scheduling
[Bug rtl-optimization/85354] [8 regression] ICE with gcc.dg/graphite/pr84872.c starting with r259313
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354 --- Comment #4 from Alexander Monakov --- Author: amonakov Date: Thu Apr 12 15:40:44 2018 New Revision: 259348 URL: https://gcc.gnu.org/viewcvs?rev=259348=gcc=rev Log: sel-sched: move cleanup_cfg before calculate_dominance_info (PR 85354) PR rtl-optimization/85354 * sel-sched-ir.c (sel_init_pipelining): Move cfg_cleanup call... * sel-sched.c (sel_global_init): ... here. Modified: trunk/gcc/ChangeLog trunk/gcc/sel-sched-ir.c trunk/gcc/sel-sched.c
[Bug rtl-optimization/85354] [8 regression] ICE with gcc.dg/graphite/pr84872.c starting with r259313
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354 Alexander Monakov changed: What|Removed |Added CC||abel at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- Thanks. Judging from the backtrace, we shouldn't call cleanup_cfg after dominators are computed: it will invalidate dominators without freeing or fixing them. I wonder if that's "by design". A simple way out is to run cleanup_cfg early enough. I'll bootstrap/regtest the following on gcc112: diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index 50a7daafba6..ee970522890 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -30,7 +30,6 @@ along with GCC; see the file COPYING3. If not see #include "cfgrtl.h" #include "cfganal.h" #include "cfgbuild.h" -#include "cfgcleanup.h" #include "insn-config.h" #include "insn-attr.h" #include "recog.h" @@ -6122,9 +6121,6 @@ make_regions_from_loop_nest (struct loop *loop) void sel_init_pipelining (void) { - /* Remove empty blocks: their presence can break assumptions elsewhere, - e.g. the logic to invoke update_liveness_on_insn in sel_region_init. */ - cleanup_cfg (0); /* Collect loop information to be used in outer loops pipelining. */ loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_FALLTHRU_PREHEADERS diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index cd29df35666..59762964c6e 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3. If not see #include "tm_p.h" #include "regs.h" #include "cfgbuild.h" +#include "cfgcleanup.h" #include "insn-config.h" #include "insn-attr.h" #include "params.h" @@ -7661,6 +7662,10 @@ sel_sched_region (int rgn) static void sel_global_init (void) { + /* Remove empty blocks: their presence can break assumptions elsewhere, + e.g. the logic to invoke update_liveness_on_insn in sel_region_init. */ + cleanup_cfg (0); + calculate_dominance_info (CDI_DOMINATORS); alloc_sched_pools ();
[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566 Alexander Monakov changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Alexander Monakov --- Fixed.
[Bug middle-end/82407] [meta-bug] qsort_chk fallout tracking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82407 Bug 82407 depends on bug 84566, which changed state. Bug 84566 Summary: error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug rtl-optimization/85099] [meta-bug] selective scheduling issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099 Bug 85099 depends on bug 84566, which changed state. Bug 84566 Summary: error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug target/84301] [6/7 Regression] ICE in create_pre_exit, at mode-switching.c:451
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84301 Alexander Monakov changed: What|Removed |Added Known to work||8.0 Assignee|unassigned at gcc dot gnu.org |amonakov at gcc dot gnu.org Summary|[6/7/8 Regression] ICE in |[6/7 Regression] ICE in |create_pre_exit, at |create_pre_exit, at |mode-switching.c:451|mode-switching.c:451 Known to fail|8.0 | --- Comment #6 from Alexander Monakov --- Fixed on the trunk.
[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566 --- Comment #4 from Alexander Monakov --- Author: amonakov Date: Wed Apr 11 14:36:04 2018 New Revision: 259322 URL: https://gcc.gnu.org/viewcvs?rev=259322=gcc=rev Log: sched-deps: respect deps->readonly in macro-fusion (PR 84566) PR rtl-optimization/84566 * sched-deps.c (sched_analyze_insn): Check deps->readonly when invoking sched_macro_fuse_insns. Modified: trunk/gcc/ChangeLog trunk/gcc/sched-deps.c
[Bug target/84301] [6/7/8 Regression] ICE in create_pre_exit, at mode-switching.c:451
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84301 --- Comment #5 from Alexander Monakov --- Author: amonakov Date: Wed Apr 11 14:32:32 2018 New Revision: 259321 URL: https://gcc.gnu.org/viewcvs?rev=259321=gcc=rev Log: sched-rgn: run add_branch_dependencies for sel-sched (PR 84301) PR target/84301 * sched-rgn.c (add_branch_dependences): Move sel_sched_p check here... (compute_block_dependences): ... from here. testsuite/ * gcc.target/i386/pr84301.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr84301.c Modified: trunk/gcc/ChangeLog trunk/gcc/sched-rgn.c trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in bb_note) w/ selective scheduling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659 Alexander Monakov changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |amonakov at gcc dot gnu.org Summary|[6/7/8 Regression] ICE: |[6/7 Regression] ICE: |Segmentation fault (stack |Segmentation fault (stack |overflow in bb_note) w/ |overflow in bb_note) w/ |selective scheduling|selective scheduling --- Comment #3 from Alexander Monakov --- Fixed on the trunk. Unfortunately the Changelog entry had a typo in the PR#: Author: amonakov Date: Wed Apr 11 10:40:07 2018 New Revision: 259313 URL: https://gcc.gnu.org/viewcvs?rev=259313=gcc=rev Log: sel-sched: run cleanup_cfg just before loop_optimizer_init (PR 84659) PR rtl-optimization/84659 * sel-sched-ir.c (sel_init_pipelining): Invoke cleanup_cfg. testsuite/ * gcc.dg/pr84659.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr84659.c Modified: trunk/gcc/ChangeLog trunk/gcc/sel-sched-ir.c trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in bb_note) w/ selective scheduling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659 --- Comment #4 from Alexander Monakov --- Author: amonakov Date: Wed Apr 11 10:48:42 2018 New Revision: 259314 URL: https://gcc.gnu.org/viewcvs?rev=259314=gcc=rev Log: fix PR 84659 references in ChangeLog files Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog
[Bug tree-optimization/85275] New: copyheader peels off almost the entire iteration
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85275 Bug ID: 85275 Summary: copyheader peels off almost the entire iteration Product: gcc Version: unknown Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- I expected predcom to eliminate one of the loads in this loop at -O3: int is_sorted(int *a, int n) { for (int i = 0; i < n - 1; i++) if (a[i] > a[i + 1]) return 0; return 1; } Unfortunately, predcom bails out since the loads it sees are not always-executed. Ideally loop header copying would make this a suitable do-while loop, but in this case it duplicates too much: ;; Loop 1 ;; header 5, latch 4 ;; depth 1, outer 0 ;; nodes: 5 4 3 ;; 2 succs { 5 } ;; 3 succs { 6 4 } ;; 4 succs { 5 } ;; 5 succs { 3 6 } ;; 6 succs { 1 } Analyzing loop 1 Loop 1 is not do-while loop: latch is not empty. Will duplicate bb 5 Will duplicate bb 3 Not duplicating bb 4: it is single succ. Duplicating header of the loop 1 up to edge 3->4, 12 insns. [...] [local count: 114863532]: _17 = n_12(D) + -1; if (_17 > 0) goto ; [94.50%] else goto ; [5.50%] [local count: 108546038]: _18 = 0; _19 = _18 * 4; _20 = a_13(D) + _19; _21 = *_20; _22 = _18 + 1; _23 = _22 * 4; _24 = a_13(D) + _23; _25 = *_24; if (_21 > _25) goto ; [5.50%] else goto ; [94.50%] [local count: 906139986]: _1 = (long unsigned int) i_15; _2 = _1 * 4; _3 = a_13(D) + _2; _4 = *_3; _5 = _1 + 1; _6 = _5 * 4; _7 = a_13(D) + _6; _8 = *_7; if (_4 > _8) goto ; [5.50%] else goto ; [94.50%] [local count: 958878293]: # i_26 = PHI <0(3), i_15(4)> i_15 = i_26 + 1; _9 = n_12(D) + -1; if (_9 > i_15) goto ; [94.50%] else goto ; [5.50%] (throttling it down with --param max-loop-header-insns=5 gives the expected optimization)
[Bug c++/85091] Compiler generates different code depending on whether -Wnonnull -Woverloaded-virtual given or not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85091 --- Comment #13 from Alexander Monakov --- > (in the diffs, plus-lines correspond to -Wnonnull added to command line) No, sorry, it was the other way around. Here's the reverse diff with more context: if (0) { <; } - if (0) -{ - <; -} } It corresponds to if(!(!std::signbit(bourn_cast( From(0) { lmi_test::record_error(); }; if(!(std::signbit(bourn_cast(-From(0) { lmi_test::record_error(); }; in template instantiation test_floating_conversions. Essentially, with -Wnonnull the second condition seems to be folded to truth value.
[Bug c++/85091] Compiler generates different code depending on whether -Wnonnull -Woverloaded-virtual given or not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85091 --- Comment #12 from Alexander Monakov --- I can reproduce it with downloaded Debian's cc1plus, and for me -Wnonnull alone is sufficient to cause diverging codegen. It diverges very early, in the frontend: diff of .tu dumps starts with: --- a/1/16795.cpp.001t.tu +++ b/2/16795.cpp.001t.tu @@ -110354,336 +110354,337 @@ @56158 bind_exprtype: @27 body: @59125 @56159 cond_exprtype: @27 op 0: @5106op 1: @59126 op 2: @59127 -@56160 cleanup_point_expr type: @27 op 0: @59128 -@56161 convert_expr type: @27 op 0: @59129 -@56162 call_exprtype: @109 fn : @59130 0 : @59131 - 1 : @59132 -@56163 expr_stmttype: @27 line: 732 expr: @59133 -@56164 cleanup_point_expr type: @109 op 0: @59134 +@56160 cond_exprtype: @27 op 0: @5106op 1: @59128 + op 2: @59129 +@56161 convert_expr type: @27 op 0: @59130 +@56162 call_exprtype: @109 fn : @59131 0 : @59132 + 1 : @59133 +@56163 expr_stmttype: @27 line: 732 expr: @59134 +@56164 cleanup_point_expr type: @109 op 0: @59135 and .original diff has the following hunk: @@ -17695,8 +17695,11 @@ return = __out; <; } - <; +} } (in the diffs, plus-lines correspond to -Wnonnull added to command line)
[Bug c++/85091] Compiler generates different code depending on whether -Wnonnull -Woverloaded-virtual given or not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85091 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #8 from Alexander Monakov --- Vadim, can you please check if the issue is reproducible on preprocessed (-E) input as well, and if so, attach the preprocessed testcase so people can try to repro it without downloading Debian's MinGW headers? Thanks.
[Bug sanitizer/84761] AddressSanitizer is not compatible with glibc 2.27 on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84761 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #7 from Alexander Monakov --- Is it possible that a distribution would backport glob() changes together with its symver update (without also backporting the regparm change)? In that case the dlvsym check shown above will be wrong I think. Would the approach with confstr query for glibc version (discussed on irc) be less fragile?
[Bug inline-asm/84861] -flto with asm() optimizes too much
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84861 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #3 from Alexander Monakov --- PR 57703 seems to be the "canonical instance" for the toplevel-asms-with-lto issue.
[Bug middle-end/84681] New: tree-ter moving code too much
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84681 Bug ID: 84681 Summary: tree-ter moving code too much Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Target: x86_64 The following code (derived from a hot loop in a Huffman encoder, reported by Fabian Giesen) suffers from TER activity too much on x86-64. TER lifts loads+zero_extends to the BB head, sinking variable-length shifts and increasing register pressure too badly. Not being very familiar with TER, I think it would be good to understand why loads are lifted all the way up to BB head like that. That's probably not supposed to happen (and may be fixable without a TER overhaul?) unsigned long long f(unsigned char *from, unsigned char *from_end, unsigned long long *codes, unsigned char *lens) { unsigned char sym0, sym1, sym2; unsigned long long bits0=0, bits1=0, bits2=0; unsigned char count0=0, count1=0, count2=0; do { sym0 = *from++; bits0 |= codes[sym0] << count0; count0 += lens[sym0]; sym1 = *from++; bits1 |= codes[sym1] << count1; count1 += lens[sym1]; sym2 = *from++; bits2 |= codes[sym2] << count2; count2 += lens[sym2]; sym0 = *from++; bits0 |= codes[sym0] << count0; count0 += lens[sym0]; sym1 = *from++; bits1 |= codes[sym1] << count1; count1 += lens[sym1]; sym2 = *from++; bits2 |= codes[sym2] << count2; count2 += lens[sym2]; sym0 = *from++; bits0 |= codes[sym0] << count0; count0 += lens[sym0]; sym1 = *from++; bits1 |= codes[sym1] << count1; count1 += lens[sym1]; sym2 = *from++; bits2 |= codes[sym2] << count2; count2 += lens[sym2]; sym0 = *from++; bits0 |= codes[sym0] << count0; count0 += lens[sym0]; sym1 = *from++; bits1 |= codes[sym1] << count1; count1 += lens[sym1]; sym2 = *from++; bits2 |= codes[sym2] << count2; count2 += lens[sym2]; sym0 = *from++; bits0 |= codes[sym0] << count0; count0 += lens[sym0]; sym1 = *from++; bits1 |= codes[sym1] << count1; count1 += lens[sym1]; sym2 = *from++; bits2 |= codes[sym2] << count2; count2 += lens[sym2]; } while(from != from_end); return bits0+bits1+bits2; }
[Bug tree-optimization/84562] -faggressive-loop-optimizations makes decisions based on weak data structures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84562 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #3 from Alexander Monakov --- It's not just -faggressive-loop-optimizations, it seems that constructors of weak globals are available for folding, and I really doubt that's actually intended; after all, GCC does always consider weak function interposable, so why not objects? Compare: __attribute__((weak)) const int x=0; int f(){return x==0;} f: movl$1, %eax ret vs. __attribute__((weak)) int x(void){return 0;} int f(){return x()==0;} f: subq$8, %rsp callx testl %eax, %eax sete%al movzbl %al, %eax popq%rdx ret
[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-02-26 CC||abel at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #3 from Alexander Monakov --- Confirmed. Our comparator breaks here: /* Prefer SCHED_GROUP_P insns to any others. */ if (SCHED_GROUP_P (tmp_insn) != SCHED_GROUP_P (tmp2_insn)) { if (VINSN_UNIQUE_P (tmp_vinsn) && VINSN_UNIQUE_P (tmp2_vinsn)) return SCHED_GROUP_P (tmp2_insn) ? 1 : -1; /* Now uniqueness means SCHED_GROUP_P is set, because schedule groups cannot be cloned. */ if (VINSN_UNIQUE_P (tmp2_vinsn)) return 1; return -1; } when we have two non-unique insns such that one is in a sched group. That is not supposed to happen actually, since SCHED_GROUP_P should imply VINSN_UNIQUE_P. This invariant is broken when sched_macro_fuse_insns sets SCHED_GROUP_P without looking at deps->readonly. So while we could get rid of the issue by rewriting the problematic sel-sched code in terms of SCHED_GROUP_P only, lack of deps->readonly check for macro-fusion seems like a bigger issue and should be fixed too.
[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566 --- Comment #2 from Alexander Monakov --- Bah, built a wrong branch, not the trunk. I'll recheck later, sorry for the noise.
[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566 --- Comment #1 from Alexander Monakov --- Sorry, I cannot reproduce this. I've built a cross-compiler from today's trunk via 'configure --target aarch64-linux-gnu && make all-gcc' (i.e. just to cc1plus, no binutils etc.) and it doesn't abort. If possible please add 'g++ -v' output, svn revision, and any other info that can help me reproduce the issue.
[Bug target/84301] [6/7/8 Regression] ICE in create_pre_exit, at mode-switching.c:451
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84301 --- Comment #4 from Alexander Monakov --- Moreover, without --param selsched-max-lookahead=2 sel-sched moves both the assignment and use into middle of BB 2, breaking the assumption in mode-switching that retval use is the last insn: 249 /* If this function returns a value at the end, we have to 250insert the final mode switch before the return value copy 251to its hard register. */ 252 if (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1 253 && NONJUMP_INSN_P ((last_insn = BB_END (src_bb))) 254 && GET_CODE (PATTERN (last_insn)) == USE 255 && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG) (independently of max-pending-list-length being 0 or not). It seems a bit surprising that mode-switching needs to treat return value specially, but more importantly, are the restrictions on return value register set/use placement written down somewhere? I don't see any explicit dependencies or barriers either, so isn't this like a repeat of cc0 situation? What are other (dozens of) RTL passes doing to avoid disturbing the required order? Looking via gdb, apparently what pins those uses/clobbers to BB end for haifa-sched is: 2728 /* Selective scheduling handles control dependencies by itself. */ 2729 if (!sel_sched_p ()) 2730add_branch_dependences (head, tail); but the function doesn't do what it says on the tin: 2432 /* Add dependences so that branches are scheduled to run last in their 2433block. */ 2434 static void 2435 add_branch_dependences (rtx_insn *head, rtx_insn *tail) 2436 { 2437 rtx_insn *insn, *last; 2438 2439 /* For all branches, calls, uses, clobbers, cc0 setters, and instructions 2440 that can throw exceptions, force them to remain in order at the end of 2441 the block by adding dependencies and giving the last a high priority. 2442 There may be notes present, and prev_head may also be a note.
[Bug c++/84191] Compiler ICEs when trying to resolve impossible arithmetic operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84191 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #2 from Alexander Monakov --- Testcase needs -march=znver1 for builtins to be available (comment #0 shows -march=native which is unfortunately ambiguous).
[Bug c/70952] Missing warning for likely-erroneous octal escapes in string literals
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70952 --- Comment #7 from Alexander Monakov --- Code in comment #0 is also valid, it's just rather questionable (the octal literal is \00) and most likely unintended (or intentionally misleading).
[Bug c/70952] Missing warning for likely-erroneous octal escapes in string literals
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70952 Alexander Monakov changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|DUPLICATE |--- --- Comment #5 from Alexander Monakov --- No, it's not a dup? Invalid octal literals outside of strings are already properly diagnosed, so the other bug talks about warning about them _as a matter of style_. This bug is about confusing use of octal literals in string constants. Compare: char c=008; error: invalid digit "8" in octal constant char c[]="\008"; [silently accepted with -Wall -Wextra, emits a string literal of size 3]
[Bug gcov-profile/84107] New: indirect call profiling broken with multiple DSOs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84107 Bug ID: 84107 Summary: indirect call profiling broken with multiple DSOs Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: visibility, wrong-code Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org CC: marxin at gcc dot gnu.org Target Milestone: --- Created attachment 43272 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43272=edit testcase archive (Marxin, on IRC you've requested this bug to be filed; enjoy!) The finely crafted testcase in the attachment segfaults with null pointer dereference in __gcov_indirect_call_profiler_v2. In general libgcov should have "hidden" visibility on small symbols that have no need to inter-operate between different shared objects and can be freely duplicated in user-built shared libraries (thus indirect profiling symbols probably all miss the visibility annotation). Large symbols and symbols that must exist in exactly one instance in the running program probably should be a part of (nonexistent) libgcov.so.0.
[Bug rtl-optimization/83913] [6/7/8 Regression] Compile time and memory hog w/ selective scheduling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83913 --- Comment #2 from Alexander Monakov --- Thanks. While I could not find why we blow up with Haswell tuning but not say Sandybridge, the main problem is that with all those -fno-... flags we have a few insns of the form rK = rN where rN is loop-invariant and rK is unused, so the insns are movable anywhere, including across the loop backedge (since pipelining is enabled). We try to fill schedule holes (caused by long-latency integer division insns) by repeatedly pipelining them. Eventually sched_times cut off should prevent that, but it doesn't grow as intended because bookkeeping copies get sched_times 0, and expr merging takes the minimum of two sched_times.