[Bug target/109519] aarch64: wrong code with NEON intrinsics on gcc-10 and later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109519 --- Comment #5 from Sebastian Pop --- Thanks Andrew for the patch, it fixes the issue.
[Bug target/109519] New: aarch64: wrong code with NEON intrinsics on gcc-10 and later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109519 Bug ID: 109519 Summary: aarch64: wrong code with NEON intrinsics on gcc-10 and later Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: spop at gcc dot gnu.org Target Milestone: --- Steps to reproduce: $ git clone https://github.com/sebpop/bitshuffle.git -b gcc-10-bug $ cd bitshuffle/reproduce $ make $ ./a.out The expected output is produced by gcc-7, gcc-9, and clang-15. 16384 4 14 16 33 39 45 51 57 67 102 108 120 126 128 134 138 140 [...] gcc-9 is the last version of gcc I tested that works. gcc-10 produces the following output: ./a.out 16384 0 0 0 0 39 45 51 57 gcc-11 and gcc-trunk produce the following output: ./a.out 16384 0 0 0 0 0 0 0 The output is also correct when removing the before-last patch from the git repo https://github.com/kiyo-masui/bitshuffle/pull/140 This patch exposes the bug in gcc by using NEON intrinsics instead of scalar computations to translate move_mask instructions from SSE2 to NEON.
[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409 Sebastian Pop changed: What|Removed |Added CC||spop at gcc dot gnu.org --- Comment #18 from Sebastian Pop --- A new 5% regression happened in gcc-trunk more recently and may be due to another patch. Rama was bisecting a 15% perf regression on lbm when updating gcc-7 to gcc-10. The regression can be seen on the LNT graph link from comment#3 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=633.477.0=683.477.0=664.477.0=648.477.0=618.477.0=605.477.0=759.477.0=584.477.0 gcc-6 has execution time of 213 seconds gcc-7 is at 215 seconds gcc-8 is at 266 gcc-9 at 259 gcc-10 at 260 Honza's patch seems to be unrelated as it was committed to trunk before gcc-10 release on May 7, 2020: commit a9a4edf0e71bbac9f1b5dcecdcf9250111d16889 Author: Jan Hubicka Date: Sat Nov 30 22:25:24 2019 +0100 Update max_bb_count in execute_fixup_cfg We need to git-bisect between gcc-7 and gcc-8.
[Bug debug/98776] DW_AT_low_pc is inconsistent with function entry address, when enabling -fpatchable-function-entry
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #15 from Sebastian Pop --- Fixed for arm64 as well on master, and backported to active branches gcc-12, 11, and 10.
[Bug debug/98776] DW_AT_low_pc is inconsistent with function entry address, when enabling -fpatchable-function-entry
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776 --- Comment #10 from Sebastian Pop --- Patch for arm64: https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607601.html
[Bug middle-end/107485] [10 Regression] gcc-10 ICE with -fnon-call-exception
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107485 --- Comment #10 from Sebastian Pop --- Thanks Richard. The patch fixed the larger test as well.
[Bug middle-end/107485] New: gcc-10 ICE with -fnon-call-exception
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107485 Bug ID: 107485 Summary: gcc-10 ICE with -fnon-call-exception Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: spop at gcc dot gnu.org Target Milestone: --- On arm64-linux I see the following crash only on gcc-10. I do not see the ICE on gcc-11, 12, and trunk. $ ~/gcc-10/bld/gcc/cc1plus -fnon-call-exceptions f.ii [...] f.ii:29:23: internal compiler error: Segmentation fault 29 | template void x(double *, b, unsigned long *) { f(); } | ^ 0x134e58b crash_signal ../../gcc/toplev.c:328 0x1639464 tree_vec_extract(gimple_stmt_iterator*, tree_node*, tree_node*, tree_node*, tree_node*) ../../gcc/tree-vect-generic.c:140 0x163ca0f expand_vector_condition ../../gcc/tree-vect-generic.c:1044 0x164081f expand_vector_operations_1 ../../gcc/tree-vect-generic.c:1988 0x16419f7 expand_vector_operations ../../gcc/tree-vect-generic.c:2240 0x1641b3f execute ../../gcc/tree-vect-generic.c:2284 [...] $ cat f.ii typedef long a; typedef double b; typedef struct { a c __attribute__((__vector_size__(32))); b d __attribute__((__vector_size__(32))); } e; __attribute__((__always_inline__)) b f() { e g, h, i; g.c = h.d < i.d; } class j { bool k(); }; template void ab(aa, l, n) { int o; typename n::p q; unsigned long r; q(0, o, ); } namespace s { template void t(j *, long, long, unsigned long *, int u) { n ac; void v(); ab(v, u, ac); } } // namespace s struct w { template void x(double *, b, unsigned long *) { f(); } double ad; void operator()(double, double, unsigned long *) { unsigned long m; x<0>(, 0, ); } }; using s::t; struct y { using p = w; }; long ag, ah; unsigned long ai; double aj; bool j::k() { using n = y; t(this, ag, ah, , aj); } git bisect stops on this patch: commit 1e676cfbe1e13fba2c636b560362ed4f0a56893d Author: Richard Biener Date: Mon May 18 08:51:23 2020 +0200 middle-end/95171 - inlining of trapping compare into non-call EH fn This fixes always-inlining across -fnon-call-exception boundaries for conditions which we do not allow to throw. 2020-05-18 Richard Biener PR middle-end/95171 * tree-inline.c (remap_gimple_stmt): Split out trapping compares when inlining into a non-call EH function. * gcc.dg/pr95171.c: New testcase. (cherry picked from commit fe168751c5c1c517c7c89c9a1e4e561d66b24663)
[Bug debug/98776] DW_AT_low_pc is inconsistent with function entry address, when enabling -fpatchable-function-entry
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776 Sebastian Pop changed: What|Removed |Added CC||spop at gcc dot gnu.org --- Comment #9 from Sebastian Pop --- Hi, is somebody working on fixing this on arm64? If not I will be working on it. The linux kernel needs this fixed for systemtap and perf probe.
[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162 Sebastian Pop changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #14 from Sebastian Pop --- Fixed.
[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162 Sebastian Pop changed: What|Removed |Added Attachment #52762|0 |1 is obsolete|| --- Comment #8 from Sebastian Pop --- Created attachment 52826 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52826=edit patch You are right. Please see attached an amended patch that only adds the barriers to __sync builtins.
[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162 Sebastian Pop changed: What|Removed |Added Attachment #52755|0 |1 is obsolete|| --- Comment #5 from Sebastian Pop --- Created attachment 52762 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52762=edit patch The attached patch fixes the issue for __sync builtins by adding the missing barrier to -march=armv8-a+nolse path in the outline-atomics functions. The patch also changes the behavior of __atomic builtins for -moutline-atomics -march=armv8-a+nolse to be the same as for -march=armv8-a+lse.
[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162 --- Comment #4 from Sebastian Pop --- The attached patch degrades performance on cpus with LSE: the barrier is not needed when outline-atomics execute an LSE instruction. I was thinking to add the barrier to the armv8.0 generic path (no LSE) in the outline-atomics functions.
[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162 Sebastian Pop changed: What|Removed |Added Attachment #52750|0 |1 is obsolete|| --- Comment #3 from Sebastian Pop --- Created attachment 52755 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52755=edit patch LSE atomics do not need a barrier. Updated the patch to only generate the barriers after outline-atomics calls.
[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162 --- Comment #2 from Sebastian Pop --- Created attachment 52750 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52750=edit patch Fix.
[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162 --- Comment #1 from Sebastian Pop --- Also happens when compiling with LSE: -march=armv8.1-a or later.
[Bug target/105162] New: [AArch64] outline-atomics drops dmb ish barrier on __sync builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162 Bug ID: 105162 Summary: [AArch64] outline-atomics drops dmb ish barrier on __sync builtins Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: spop at gcc dot gnu.org Target Milestone: --- With -mno-outline-atomics gcc produces a `dmb ish` barrier on __sync builtins as required by the Intel specification (see fix for https://gcc.gnu.org/PR65697 https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=f70fb3b635f9618c6d2ee3848ba836914f7951c2 https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=ab876106eb689947cdd8203f8ecc6e8ac38bf5ba ) $ cat a.c int foo(int a) { return __sync_bool_compare_and_swap(, 4, 5); } $ gcc -O2 a.c -S -o- -mno-outline-atomics foo: sub sp, sp, #16 mov w1, 5 str w0, [sp, 12] add x0, sp, 12 .L4: ldxrw2, [x0] cmp w2, 4 bne .L5 stlxr w3, w1, [x0] cbnzw3, .L4 .L5: dmb ish csetw0, eq add sp, sp, 16 ret With -moutline-atomics gcc does not generate the barrier: $ gcc -O2 a.c -S -o- -moutline-atomics foo: stp x29, x30, [sp, -32]! mov w1, 5 mov x29, sp add x2, sp, 28 str w0, [sp, 28] mov w0, 4 bl __aarch64_cas4_acq_rel cmp w0, 4 csetw0, eq ldp x29, x30, [sp], 32 ret Happens on gcc-8, 9, 10, 11, and trunk.
[Bug rtl-optimization/99346] New: [aarch64] ICE in gen_rtx_SUBREG, at emit-rtl.c:1021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99346 Bug ID: 99346 Summary: [aarch64] ICE in gen_rtx_SUBREG, at emit-rtl.c:1021 Product: gcc Version: 8.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: spop at gcc dot gnu.org Target Milestone: --- Created attachment 50289 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50289=edit pre-processed reduced testcase gcc-8, gcc-9, and gcc-10 from Ubuntu 20.04 are failing to compile the attached test at -O2 and -O3 on Graviton2 aarch64-linux. $ g++-10 -O2 a.ii [...] a.ii:362:50: internal compiler error: in gen_rtx_SUBREG, at emit-rtl.c:1021 $ g++-8 -O2 a.ii [...] a.ii:493:11: internal compiler error: in gen_rtx_SUBREG, at emit-rtl.c:1010 Similar bug was reported/fixed on x86: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83723
[Bug c++/99012] gcc-8.4.0 on aarch64 hits internal error during RTL pass: expand if `std::copysign` is used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99012 --- Comment #3 from Sebastian Pop --- I do not see the bug with today's cc1plus from origin/releases/gcc-8
[Bug c++/99012] gcc-8.4.0 on aarch64 hits internal error during RTL pass: expand if `std::copysign` is used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99012 Sebastian Pop changed: What|Removed |Added CC||spop at gcc dot gnu.org --- Comment #2 from Sebastian Pop --- I see the bug with $ gcc-8 --version gcc-8 (Ubuntu/Linaro 8.4.0-1ubuntu1~18.04) 8.4.0
[Bug target/98877] New: [AArch64] Inefficient code generated for tbl NEON intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877 Bug ID: 98877 Summary: [AArch64] Inefficient code generated for tbl NEON intrinsics Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: spop at gcc dot gnu.org Target Milestone: --- The use of NEON intrinsics is inefficient and leads developers to prefer inline assembly instead of intrinsics. A similar performance bug for vmlal intrinsics was reported in https://gcc.gnu.org/PR92665 The code generated by GCC for table lookups is also inefficient: $ cat red.c #include "arm_neon.h" uint8x16_t fun(uint8x16_t lo, uint8x16_t hi, uint8x16_t idx) { uint8x16x2_t tab = { .val = {lo, hi} }; uint8x16_t res = vqtbl2q_u8(tab, idx); return res; } $ gcc -O3 -S -o- red.c fun: mov v4.16b, v0.16b mov v5.16b, v1.16b tbl v0.16b, {v4.16b - v5.16b}, v2.16b ret $ clang -O3 -S -o- red.c fun: tbl v0.16b, { v0.16b, v1.16b }, v2.16b ret
[Bug target/97802] New: [AArch64] Incorrect documentation for Arm64 NEON
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97802 Bug ID: 97802 Summary: [AArch64] Incorrect documentation for Arm64 NEON Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: spop at gcc dot gnu.org Target Milestone: --- The following text in doc/invoke.texi seems to be outdated. To avoid confusion the text needs to be more specific on which NEON implementations it applies: "If the selected floating-point hardware includes the NEON extension (e.g.@: @option{-mfpu=neon}), note that floating-point operations are not generated by GCC's auto-vectorization pass unless @option{-funsafe-math-optimizations} is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision." This used to be true for older NEON implementations. NEON implementation in Armv8 and later is IEEE 754 compliant.
[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665 --- Comment #7 from Sebastian Pop --- Hi Andrew, have you committed the fix for this?
[Bug target/92692] Saving off the callee saved register between ldxr/stxr (caused by shrink wrapping improvements)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92692 --- Comment #23 from Sebastian Pop --- > I don't see anything like that on the gcc-9 branch - are you sure you don't > have an outstanding change somehow? You are right, a part of the -moutline-atomics patch that I am working on backporting to branch 9 added that change.
[Bug target/92692] Saving off the callee saved register between ldxr/stxr (caused by shrink wrapping improvements)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92692 Sebastian Pop changed: What|Removed |Added CC||spop at gcc dot gnu.org --- Comment #21 from Sebastian Pop --- It looks like this hunk from the trunk version of the patch is missing on gcc-9 branch: diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md index cabcc58f1a0..1458bc00095 100644 --- a/gcc/config/aarch64/atomics.md +++ b/gcc/config/aarch64/atomics.md @@ -104,7 +104,7 @@ (clobber (match_scratch:SI 7 "="))] "" "#" - "&& reload_completed" + "&& epilogue_completed" [(const_int 0)] { aarch64_split_compare_and_swap (operands); With this hunk applied my bootstrap passes on the gcc-9 branch on an aarch64-linux graviton2. Without this hunk I see an error in thread sanitizers. I also have checked gcc-8 release branch and it seems that the patch is not missing any hunks in that branch. Could somebody apply the missing hunk to the gcc-9 release branch? Thanks!
[Bug rtl-optimization/92665] New: [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665 Bug ID: 92665 Summary: [AArch64] low lanes select not optimized out for vmlal intrinsics Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: spop at gcc dot gnu.org Target Milestone: --- With gcc as of today I see dup instructions that could be optimized out: $ cat red.c #include "arm_neon.h" int32x4_t fun(int32x4_t a, int16x8_t b, int16x8_t c) { a = vmlal_s16(a, vget_low_s16(b), vget_low_s16(c)); a = vmlal_high_s16(a, b, c); return a; } $ gcc -O3 -S -o- red.c fun: dup d3, v1.d[0] dup d4, v2.d[0] smlal v0.4s,v3.4h,v4.4h smlal2 v0.4s,v1.8h,v2.8h ret $ clang -O3 -S -o- red.c fun: smlal v0.4s, v1.4h, v2.4h smlal2 v0.4s, v1.8h, v2.8h ret
[Bug tree-optimization/86865] [9 Regression] Wrong code w/ -O2 -floop-parallelize-all -fstack-reuse=none -fwrapv -fno-tree-ch -fno-tree-dce -fno-tree-dominator-opts -fno-tree-loop-ivcanon
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86865 Sebastian Pop changed: What|Removed |Added CC||spop at gcc dot gnu.org --- Comment #7 from Sebastian Pop --- I think the patch is ok. If in the future we want to handle those other loops, we will need to compute the loop bound in add_loop_constraints() with a check for whether the stmt is dominated by the exit or not. Here is what we do today for all stmts in the loop: tree nb_iters = number_of_latch_executions (loop); if (TREE_CODE (nb_iters) == INTEGER_CST) { /* loop_i <= cst_nb_iters */ the constraint '<=' on statements' iteration domains implies that the loop should be under a do-while form.
[Bug tree-optimization/87917] ICE in initialize_matrix_A at gcc/tree-data-ref.c:3150
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87917 Sebastian Pop changed: What|Removed |Added CC||spop at gcc dot gnu.org --- Comment #3 from Sebastian Pop --- > Sebastian - can you say if > evolution_function_is_affine_multivariate_p ({0, +, {0, +, 4}_1}_2, 1) > should really return true? You are right, {0, +, {0, +, 4}_1}_2 is not a valid affine multivariate function: only the base (not the step) should vary in an outer loop. For example, this would be an affine multivariate: {{0, +, 4}_1, +, 42}_2.
[Bug tree-optimization/82449] code-gen error in get_rename_from_scev
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82449 --- Comment #2 from Sebastian Pop --- This part is not affine: {0, +, {1, +, 1}_1}_1 This is a polynomial of degree 2. Are you sure the scev analysis reports this as affine? I was trying to understand from the fortran code which part this scev comes from... and I think it comes from the NKL counter that gets incremented in the inner loop, counting the number of iterations of both loops, so it has a quadratic evolution.
[Bug tree-optimization/69728] [6/7 Regression] internal compiler error: in outer_projection_mupa, at graphite-sese-to-poly.c:1175
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69728 --- Comment #22 from Sebastian Pop --- > I put it on my TODO to figure out how to "DCE" a stmt > (or in this case it's rather the whole "loop body", right?). The code generator would not even see a statement to be generated: it would just disappear in the new code, so there is nothing to do to DCE statements with empty domains. > I've not fully found my way through initial schedule building yet > (otherwise I would have tried refactoring to not operate in pbb > vector order but more naturally follow the SESE in a CFG walk with > maintaining a BB -> pbb mapping). Yes, DOM-walk could be used to detect the sequential order in which basic blocks are executed. There are some difficulties in giving an execution sequence number for if-then and if-else clauses, and for switch cases: for the moment we represent them as executing in sequence. For example, if (c) a; else b; we would number the stmts a and b as if the code looked like this: if (c) a; if (!c) b; which is correct. The fact that the constraint "c" is added to the iteration domain of "a", and "!c" added to the iter domain of "b" allows the scheduler to know that there are no sequential dependences between stmts "a" and "b" as they are executed in different iterations.
[Bug tree-optimization/81373] [7/8 Regression] Graphite ICE in ssa_default_def at gcc/tree-dfa.c:305
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81373 --- Comment #4 from Sebastian Pop --- The patch looks good. Thanks!
[Bug tree-optimization/79622] [6/7 Regression] Wrong code w/ -O2 -floop-nest-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79622 --- Comment #10 from Sebastian Pop --- > So a black-box would be a set of stmts rather than a whole GIMPLE BB Correct: this can be an abstract view of the IR. The only place where we want to start transforming the code is in the code generation. We should be able to interrupt graphite at any point (maybe due to a compute-out) and leave the original unmodified IR. Code generation should not fail and it should be linear time in number of statements, such that when we start code generation we know that it will succeed in a short amount of compilation time. > You mean this tagging of associativeness is not yet done? Yes, we removed the tagging code when we removed the out-of-ssa translation. The original tagging relied on the name of the arrays that we created to find whether the reduction was associative. This caused some performance regressions of loops not interchanged anymore (for example the swim loop.)
[Bug tree-optimization/69728] [6/7/8 Regression] internal compiler error: in outer_projection_mupa, at graphite-sese-to-poly.c:1175
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69728 --- Comment #19 from Sebastian Pop --- > So how'd we properly handle a valid empty domain? DCE the statement. If the domain for a statement is empty, it means that the statement does not execute: it is dead code. I think we are better enforcing the elimination of the statement as this wrong analysis (or translation) of the number of iterations could produce wrong code. > I assume P_21 is c.7_12 the number after P_ is the ssa variable number, so P_21 is c.7_21. > we have 0 <= i1 <= 2147483637, whereever that comes from. you can think about i1 as a canonical induction variable: 0 <= i1 and i1 is indexing all iterations in that loop: i.e., i1 is incremented by 1. > Probably from the i1 <= 2147483637 constraint this constraint is added based on the type of the induction variable that gives an upper bound for the iteration domain. > 4294967296*floor((-1 - P_21)/4294967296) < -P_21 - i1 Yes, this constraint seems to be wrong.
[Bug tree-optimization/79622] [6/7 Regression] Wrong code w/ -O2 -floop-nest-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79622 --- Comment #8 from Sebastian Pop --- > I would have expected at least each memory op to be in a separate "black box" We could have a pass before graphite that splits BBs with more than one write into blocks that contain one data write with all the operations and data reads needed to compute the stored value. This would allow more freedom to schedule BBs around. > if you follow the original go-out-of-SSA approach you'd have their effects > on the CFG edges. So a more complete fix would similarly handle uses. In other words: how do we handle reductions? As you remember, the original way was to expose reductions by rewriting out-of-SSA scalar dependences crossing basic blocks (loop-phi nodes, loop-close-phi nodes,) tagging the properties of the reduction (commutative, associative) on the array, and adding that info to the data dependence graph. By adding those properties to the dependence graph, we give the scheduler more freedom to select transforms. We moved away from rewriting scalar dependences out-of-SSA because we do not want to transform the code if the scheduler has no better transform to be done: we do not want to leave around inefficient memory reads/writes. Instead, we handle SSA names and create scalar references added to the dependence graph. We still need to tag scalar reductions with their associative properties to allow the scheduler to reorder the computations.
[Bug tree-optimization/69728] [6/7/8 Regression] internal compiler error: in outer_projection_mupa, at graphite-sese-to-poly.c:1175
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69728 --- Comment #15 from Sebastian Pop --- It makes sense to early fail when the schedule builder gets confused and built an empty domain. Could you please also add a comment around the if that sets schedule_error? The change looks good. Thanks.
[Bug tree-optimization/79622] [6/7/8 Regression] Wrong code w/ -O2 -floop-nest-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79622 --- Comment #4 from Sebastian Pop --- Yes, that phi node looks like a reduction. We need to handle the phi as a write to expose the loop carried reduction variable to the dependence analysis. I think your change goes in the right direction. Thanks!
[Bug tree-optimization/68823] [6/7/8 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823 --- Comment #15 from Sebastian Pop --- > when DR_NUM_DIMENSIONS (dr1->dr) != DR_NUM_DIMENSIONS (dr2->dr) better "FAIL"? Yes. The patch looks good to me.
[Bug ipa/65972] ICE after applying a patch to enable verify_ssa with auto-pgo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972 --- Comment #9 from Sebastian Pop --- In the link in the previous comment, Richi has a similar patch as suggested by Dehao pending review/test/commit: let's close this bug when Richi's patch lands in trunk.
[Bug ipa/65972] ICE after applying a patch to enable verify_ssa with auto-pgo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972 --- Comment #8 from Sebastian Pop --- Yes please! This patch also solves the problem I was chasing a week or so ago: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00067.html I also know that this is ICE-ing on a large proprietary project when I compile it with autoFDO on gcc-5.x and 6.x releases.
[Bug driver/79637] missing documentation for PARAM_MAX_FSM_THREAD_LENGTH
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79637 --- Comment #3 from Sebastian Pop --- As to why we call it a "finite state automaton" jump threading, that is because this transform shows to be useful when the switch statement in the previous example is contained in a loop, which is the way most people use to implement a parser, or a finite state machine. Some of these automata are implementing state transitions by setting the next state in one of the cases. To continue the example from the previous comment, here is how a two state machine looks like: c = 1; while (1) { switch (c) { case 1: c = 5; break; case 5: c = 1; break; } } and after jump threading, it would look like this: c = 1; label1: c = 5; goto label2; label2: c = 1; goto label1; which is much faster than having to take the loop back-edge + jump from switch to case.
[Bug driver/79637] missing documentation for PARAM_MAX_FSM_THREAD_LENGTH
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79637 --- Comment #2 from Sebastian Pop --- Here is what I see in doc/invoke.texi: @item max-fsm-thread-path-insns Maximum number of instructions to copy when duplicating blocks on a finite state automaton jump thread path. The default is 100. @item max-fsm-thread-length Maximum number of basic blocks on a finite state automaton jump thread path. The default is 10. @item max-fsm-thread-paths Maximum number of new jump thread paths to create for a finite state automaton. The default is 50. I think these parameters are quite technical. The rule is that all the magic constants should have a param instead of hard coding them in the code, so they get exposed to the users of the compiler that way. Roland, I would have liked to point you to a paper that describes the algorithm for backwards jump-threading, although we have not wrote one yet. Jeff, I think it would be good if I take the time to write that paper, and I will ask you, James, and Brian to co-sign the paper. Here is a short description of how the backwards jump-threading works: We start by looking for a switch or condition statement of the form "switch(c)". Then, following the SSA definitions backwards from "c" to its definition, until a place in the program where the condition "c" is statically known at compile time. To make the example simple, let's say we reach a statement that sets "c = 5". With that information in hand, we create a new path that starts from the basic block that sets "c = 5" and ends in the target block of the switch "case 5:". This is done by duplicating all the basic blocks on the path from "c = 5" to the target of the now known value of the condition. max-fsm-thread-length is the bound on the number of basic blocks on that path, such that we do not increase too much the code size of the program.
[Bug tree-optimization/69675] [6/7 Regression] [graphite] ICE: verify_ssa failed (definition in block 42 does not dominate use in block 34)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69675 --- Comment #10 from Sebastian Pop --- (In reply to Richard Biener from comment #9) > Yeah, seems to be gone with ISL 0.18 here as well... (but with 0.16.1 I can > still reproduce it). ISL 0.18 doesn't do anything to the loop. ISL 0.16.1 > just did some IV transforms it seems: > > [scheduler] original ast: > for (int c0 = 0; c0 <= -P_14; c0 += 1) > for (int c1 = 0; c1 <= 3; c1 += 1) { > S_5(c0, c1); > if (c1 <= 2) > S_6(c0, c1); > } > > [scheduler] AST generated by isl: > for (int c0 = 0; c0 <= -P_14; c0 += 1) > for (int c1 = 3 * c0; c1 <= 3 * c0 + 3; c1 += 1) { > S_5(c0, -3 * c0 + c1); I don't know why isl started the inner loop at 3*c0: In the end we have the identity for the array subscript: 3*c0 - 3*c0 = 0 Could be a bug in the older isl. > if (3 * c0 + 2 >= c1) > S_6(c0, -3 * c0 + c1); > } > > and with ISL 0.18 we have > > [scheduler] isl optimized schedule is identical to the original schedule. > for (int c0 = 0; c0 <= -P_14; c0 += 1) > for (int c1 = 0; c1 <= 3; c1 += 1) { > S_5(c0, c1); > if (c1 <= 2) > S_6(c0, c1); > } > > and eventually code generation is not happy with the changed form > (-fgraphite-identity is fine). > > Sebastian, any comment? I think we could still for example require current > ISL > for GCC 6 (0.18 or maybe 0.17.1). Or at least drop support for the current > legacy. I would like moving away from the older isl versions: newer isl have fewer bugs, and people also worked on making isl faster. Moving to a newer isl would allow to also clean up the #ifdef's from the graphite-*.c files which will make the code easier to read.
[Bug tree-optimization/68823] [6/7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823 --- Comment #11 from Sebastian Pop --- (In reply to Richard Biener from comment #10) > But then with different number of subscripts (and also likely different > DR_BASE_OBJECT) you can't do anything with them and have to assume > dependence. See initialize_data_dependence_relation: > > /* If the references do not access the same object, we do not know > whether they alias or not. We do not care about TBAA or alignment > info so we can use OEP_ADDRESS_OF to avoid false negatives. > But the accesses have to use compatible types as otherwise the > built indices would not match. */ > if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), > OEP_ADDRESS_OF) > || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), > TREE_TYPE (DR_BASE_OBJECT (b > { > DDR_ARE_DEPENDENT (res) = chrec_dont_know; > return res; > > not sure how you communicate that to ISL of course... is it what you > use "alias-sets" for? To create extra dependence egdes? alias-sets differ for two arrays with bases that have been proven to be different. If they may point to the same thing, they will have the same number.
[Bug tree-optimization/68823] [6/7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823 --- Comment #9 from Sebastian Pop --- /* Determines the base object and the list of indices of memory reference DR, analyzed in LOOP and instantiated in loop nest NEST. */ static void dr_analyze_indices (struct data_reference *dr, loop_p nest, loop_p loop) This function initializes the subscripts with their access functions: DR_ACCESS_FNS (dr) = access_fns; The number of subscripts (or "dimensions") is then the length of that array: #define DR_NUM_DIMENSIONS(DR) DR_ACCESS_FNS (DR).length ()
[Bug tree-optimization/68823] [6/7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823 --- Comment #8 from Sebastian Pop --- The code in fault is called from pdr_add_memory_accesses() Maybe the problem is in parsing the gimple MEM[] into a data reference.
[Bug tree-optimization/68823] [6/7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823 --- Comment #7 from Sebastian Pop --- (In reply to Martin Liška from comment #5) > Created attachment 40662 [details] > Isolated graphite dump for miscompiled function > > As shown in the dump file, there are dependencies for the problematic stmts: > > Adding must write to depedence graph: pdr_121 (write > in gimple stmt: MEM[(Element_t[2] &)_7][0] = _9; > data accesses: { S_3[i2] -> [2, o1, 0] : 8*floor((o1)/8) = o1 and > 18446744073709551616*floor((8i2 - o1)/18446744073709551616) = 8i2 - o1 and 0 > <= o1 <= 18446744073709551608 } > > Adding read to depedence graph: pdr_124 (read > in gimple stmt: _15 = MEM[(int *)_14]; > data accesses: { S_6[i1] -> [2, o1] : 18446744073709551616*floor((-8i1 + > o1)/18446744073709551616) = -8i1 + o1 and 0 <= o1 <= 18446744073709551608 } > > If I understand the notation correctly, both have equal alias set (2). Do > you see Sebastian why the dependence is not caught? > S_3[i2] -> [2, o1, 0] S_6[i1] -> [2, o1] we do not detect the dependence because the two arrays do not have the same number of subscripts: also on the gimple representation we have MEM[(Element_t[2] &)_7][0] = _9; vs. _15 = MEM[(int *)_14];
[Bug tree-optimization/68823] [6/7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823 --- Comment #4 from Sebastian Pop --- The data dependence relations are dumped in the output of -fdump-tree-graphite-all. graphite-dependences.c contains the code for the data dependence computations. Looking at the gimple code it seems like a trivial write after write dependence. Do we have a reduced testcase for this problem?
[Bug tree-optimization/77362] [6/7 Regression] [graphite] ICE in sese_build_liveouts_use w/ -O2 -floop-nest-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77362 --- Comment #10 from Sebastian Pop --- (In reply to Richard Biener from comment #9) > Yeah, but the user can write such dependences himself so ideally we have > a way to undo them, like by using local scratch memory? So You are right. LLVM-Polly has a pass that undoes LIM, it is non trivial, and furthermore we'd better catch the LIM once the loop transforms are done! > > x_0 = 1; > > loop: > # x_1 = PHI> ... > x_2 = ...; > goto loop; > > turns into > > mem = 1; > > loop: > x_1 = mem; > x_2 = ...; > mem = x_2; > goto loop; > > plus replacement of exit PHIs with loads. Would that help? That's how we were handling reductions and end of loop values in the dependence graph. Today we can reason about scalars themselves and add the scalars to the dependence graph instead of generating the loads that would need to be cleaned up after graphite.
[Bug tree-optimization/77362] [6/7 Regression] [graphite] ICE in sese_build_liveouts_use w/ -O2 -floop-nest-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77362 --- Comment #8 from Sebastian Pop --- LIM in general is bad for loop transforms: it introduces loop carried dependences. If we can move graphite before LIM that would solve some problems.
[Bug tree-optimization/77362] [6/7 Regression] [graphite] ICE in sese_build_liveouts_use w/ -O2 -floop-nest-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77362 --- Comment #7 from Sebastian Pop --- The fix looks good. Thanks!
[Bug tree-optimization/77605] [5/6/7 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77605 --- Comment #6 from Sebastian Pop --- The proposed change looks good to me. "last_conflicts" is the max index in the conflicting functions for which there is a dependence: mem_access_a (conflicting_iterations_in_a (last_conflicts)) is in dependence with mem_access_b (conflicting_iterations_in_b (last_conflicts)).
[Bug tree-optimization/70956] ICE in build_cross_bb_scalars_def, at graphite-scop-detection.c:1725
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70956 --- Comment #2 from Sebastian Pop --- The change looks good to me.
[Bug middle-end/70159] missed CSE optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159 --- Comment #9 from Sebastian Pop --- Created attachment 37927 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37927=edit patch for hoisting expressions Updated the patch from PR23286 to hoist the redundant expressions: : inv_4 = 1.0e+0 / d_3(D); _18 = min_5(D) - a_6(D); _19 = _18 / inv_4; _20 = max_9(D) - a_6(D); _21 = _20 / inv_4; if (inv_4 >= 0.0) goto ; else goto ; : : # tmin_1 = PHI <_19(2), _21(3)> # tmax_2 = PHI <_21(2), _19(3)> _16 = tmin_1 + tmax_2; return _16; The attached patch does not pass make check and causes some infinite recursion.
[Bug middle-end/70159] missed CSE optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159 --- Comment #7 from Sebastian Pop --- (In reply to Andrew Pinski from comment #6) > Note this is both a hoisting and a sinking issue. > Hoisting should happen before sinking. > LLVM looks like it only implements sinking. You are right: LLVM does sinking very early as part of instcombine: it transforms the phi nodes after the if into selects over the operands and sinks the sub and mul after the select. By the time other redundancy elimination passes are executed the shape of the code is more difficult to optimize.
[Bug middle-end/70159] missed CSE optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159 --- Comment #2 from Sebastian Pop --- Right, with -Ofast it be able to optimize away the branch or selects. The original benchmark had something more complex than fadd to use the tmin and tmax results. Here is one more test using the results in a non commutative operation: bool foo_p(float d, float min, float max, float a) { float tmin; float tmax; float inv = 1.0f / d; if (inv >= 0) { tmin = (min - a) * inv; tmax = (max - a) * inv; } else { tmin = (max - a) * inv; tmax = (min - a) * inv; } return tmax > tmin; }
[Bug middle-end/70159] New: missed CSE optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159 Bug ID: 70159 Summary: missed CSE optimization Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: spop at gcc dot gnu.org Target Milestone: --- $ cat h.c float foo_p(float d, float min, float max, float a) { float tmin; float tmax; float inv = 1.0f / d; if (inv >= 0) { tmin = (min - a) * inv; tmax = (max - a) * inv; } else { tmin = (max - a) * inv; tmax = (min - a) * inv; } return tmax + tmin; } $ gcc h.c -Ofast -S -o- foo_p: fmovs4, 1.0e+0 fdivs0, s4, s0 fcmpe s0, #0.0 blt .L6 fsubs1, s1, s3 fsubs2, s2, s3 fmuls1, s1, s0 fmuls0, s2, s0 fadds0, s1, s0 ret .p2align 3 .L6: fsubs4, s2, s3 fsubs2, s1, s3 fmuls1, s4, s0 fmuls0, s2, s0 fadds0, s1, s0 ret $ clang h.c -Ofast -S -o- foo_p: // @foo_p // BB#0:// %entry fmovs4, #1. fdivs0, s4, s0 fcmps0, #0.0 fcsel s4, s1, s2, lt fcsel s1, s2, s1, lt fsubs1, s1, s3 fsubs2, s4, s3 fadds1, s2, s1 fmuls0, s1, s0 ret The computations in both branches are redundant. Even without if-conversion (fcsel), GCC should be able to sink/hoist fsub and fmul.
[Bug middle-end/69545] [6 Regression] FAIL: gfortran.dg/graphite/pr42285.f90 -O (internal compiler error)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69545 Sebastian Pop changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |spop at gcc dot gnu.org --- Comment #1 from Sebastian Pop --- I guess this issue is due to isl-0.14. With isl-0.15 it is passing. I will have a look.
[Bug middle-end/69545] [6 Regression] FAIL: gfortran.dg/graphite/pr42285.f90 -O (internal compiler error)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69545 Sebastian Pop changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Sebastian Pop --- Fixed in r232966 by reverting r232939.
[Bug tree-optimization/68343] FAIL: gcc.dg/graphite/fuse-{1,2}.c scan-tree-dumps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68343 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from Sebastian Pop --- fixed in r232811.
[Bug tree-optimization/68398] [6 Regression] coremark regression due to r229685
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68398 --- Comment #4 from Sebastian Pop --- Thanks Jeff for looking into this issue. I was thinking about a heuristic as you mentioned in comment #2: what about allowing creation of irreducible loops, multiple latches, etc. after the loop optimizers are done?
[Bug tree-optimization/69341] [6 Regression] [graphite] ICE: verify_ssa failed (error: definition in block 37 does not dominate use in block 30)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69341 Bug 69341 depends on bug 68692, which changed state. Bug 68692 Summary: [6 Regression][graphite] ice: Segmentation fault https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68692 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/68692] [6 Regression][graphite] ice: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68692 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from Sebastian Pop --- fixed in r232659.
[Bug tree-optimization/69292] [6 Regression][graphite] ICE with -floop-nest-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69292 Sebastian Pop changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #6 from Sebastian Pop --- fixed at r232659 *** This bug has been marked as a duplicate of bug 68692 ***
[Bug tree-optimization/68692] [6 Regression][graphite] ice: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68692 --- Comment #9 from Sebastian Pop --- *** Bug 69292 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/68976] [6 Regression] ICE w/ -O2 (and above) -fgraphite-identity (or -floop-nest-optimize)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68976 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #16 from Sebastian Pop --- fixed in r232658.
[Bug tree-optimization/68756] [6 Regression] ICE w/ -O1 -floop-nest-optimize and isl 0.15: isl-0.15/isl_id.c:213: unable to find id
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68756 Sebastian Pop changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |spop at gcc dot gnu.org --- Comment #2 from Sebastian Pop --- Thanks for the nice reduced testcase. I will have a look.
[Bug tree-optimization/68659] [6 regression] FAIL: gcc.dg/graphite/id-pr45230-1.c (internal compiler error)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68659 Sebastian Pop changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|DUPLICATE |--- --- Comment #10 from Sebastian Pop --- Thanks for reporting. I will have a look at what happens on arm.
[Bug bootstrap/68667] [6 Regression] GCC trunk build fails compiling graphite-isl-ast-to-gimple.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68667 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Sebastian Pop --- fixed in r231223
[Bug tree-optimization/68692] [6 Regression] ice: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68692 Sebastian Pop changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |spop at gcc dot gnu.org --- Comment #3 from Sebastian Pop --- I'm looking at it.
[Bug tree-optimization/68693] [6 Regression] ice: in harmful_stmt_in_region, at graphite-scop-detection.c:1052
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68693 --- Comment #3 from Sebastian Pop --- Author: spop Date: Fri Dec 4 21:36:55 2015 New Revision: 231309 URL: https://gcc.gnu.org/viewcvs?rev=231309=gcc=rev Log: fix PR68693: Check for loop structure when extending the SCoP The check for dominance while extending the scop assumed that multiple successors meant a loop which is not true in case of conditionals around the loop. Improved pretty printers for better debugging. PR tree-optimization/68693 * graphite-scop-detection.c (dot_all_sese): New (dot_all_scops_1): Renamed to dot_all_sese. (dot_all_scops): Removed. (dot_sese): New. (dot_cfg): New. (scop_detection::get_nearest_dom_with_single_entry): Check that preds are from different loop levels. (scop_detection::get_nearest_pdom_with_single_exit): Check that succs are from different loop levels. (scop_detection::print_sese): Inlined. (scop_detection::print_edge): New. (scop_detection::merge_sese): Added dumps. * graphite.h: Add declarations. gcc/testsuite/ChangeLog: * gfortran.dg/graphite/pr68693.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/graphite/pr68693.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/graphite-scop-detection.c trunk/gcc/graphite.h trunk/gcc/testsuite/ChangeLog
[Bug tree-optimization/68693] [6 Regression] ice: in harmful_stmt_in_region, at graphite-scop-detection.c:1052
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68693 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #4 from Sebastian Pop --- fixed
[Bug tree-optimization/68550] [6 Regression] ICE: verify_gimple failed Error: missing PHI def
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68550 Sebastian Pop changed: What|Removed |Added CC||sch...@linux-m68k.org --- Comment #6 from Sebastian Pop --- *** Bug 68659 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/68659] [6 regression] FAIL: gcc.dg/graphite/id-pr45230-1.c (internal compiler error)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68659 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #8 from Sebastian Pop --- Most likely fixed in r231206. *** This bug has been marked as a duplicate of bug 68550 ***
[Bug tree-optimization/68659] [6 regression] FAIL: gcc.dg/graphite/id-pr45230-1.c (internal compiler error)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68659 --- Comment #6 from Sebastian Pop --- I do not see the error on today's trunk at r231233. Could you please verify that this has been fixed by our changes from yesterday? Thanks!
[Bug tree-optimization/68550] [6 Regression] ICE: verify_gimple failed Error: missing PHI def
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68550 Sebastian Pop changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |spop at gcc dot gnu.org --- Comment #3 from Sebastian Pop --- Looking at it. Thanks for the nice reduced testcases.
[Bug tree-optimization/68550] [6 Regression] ICE: verify_gimple failed Error: missing PHI def
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68550 --- Comment #4 from Sebastian Pop --- Author: spop Date: Wed Dec 2 20:40:17 2015 New Revision: 231206 URL: https://gcc.gnu.org/viewcvs?rev=231206=gcc=rev Log: fix PR68550: do not handle ISL loop peeled statements In case ISL did some loop peeling, like this: S_8(0); for (int c1 = 1; c1 <= 5; c1 += 1) { S_8(c1); } S_8(6); we should not copy loop-phi nodes in S_8(0) or in S_8(6). PR tree-optimization/68550 * graphite-isl-ast-to-gimple.c (copy_loop_phi_nodes): Add dump. (copy_bb_and_scalar_dependences): Do not code generate loop peeled statements. * gfortran.dg/graphite/pr68550-1.f90: New. * gfortran.dg/graphite/pr68550-2.f90: New. Added: trunk/gcc/testsuite/gfortran.dg/graphite/pr68550-1.f90 trunk/gcc/testsuite/gfortran.dg/graphite/pr68550-2.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/graphite-isl-ast-to-gimple.c trunk/gcc/testsuite/ChangeLog
[Bug tree-optimization/68550] [6 Regression] ICE: verify_gimple failed Error: missing PHI def
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68550 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Sebastian Pop --- fixed
[Bug middle-end/68565] [6 Regression] graphite : -O2 -floop-nest-optimize miscompile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68565 --- Comment #2 from Sebastian Pop --- Author: spop Date: Mon Nov 30 20:39:16 2015 New Revision: 231086 URL: https://gcc.gnu.org/viewcvs?rev=231086=gcc=rev Log: check for ISL generated code that leads to division by zero we used to generate modulo and division by zero because ISL uses big numbers which translate to zero in modulo arithmetic. The patch also improves error handling and bails out early in case of wrong code gen. PR tree-optimization/68565 * graphite-isl-ast-to-gimple.c (binary_op_to_tree): Early return on codegen_error. Fail when rhs of division operations is integer_zerop. (ternary_op_to_tree): Early return on codegen_error. (unary_op_to_tree): Same. (nary_op_to_tree): Same. (gcc_expression_from_isl_expr_op): Same. (gcc_expression_from_isl_expression): Same. (graphite_create_new_loop): On codegen_error continue generating wrong code. (graphite_create_new_loop_guard): Same. (build_iv_mapping): Same. (graphite_create_new_guard): Same. * gfortran.dg/graphite/pr68565.f90: New. Added: trunk/gcc/testsuite/gfortran.dg/graphite/pr68565.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/graphite-isl-ast-to-gimple.c trunk/gcc/testsuite/ChangeLog
[Bug middle-end/68565] [6 Regression] graphite : -O2 -floop-nest-optimize miscompile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68565 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Sebastian Pop --- Fixed. Thanks for the testcase!
[Bug tree-optimization/68453] [6 Regression] graphite ICE: segfault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68453 Sebastian Pop changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #4 from Sebastian Pop --- fixed in r230918
[Bug tree-optimization/67984] [GRAPHITE] internal compiler error: isl_ctx freed, but some objects still reference it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67984 Sebastian Pop changed: What|Removed |Added Status|WAITING |RESOLVED Known to work||6.0 Resolution|--- |FIXED Known to fail||5.2.1 --- Comment #4 from Sebastian Pop --- Fixed in trunk gcc 6.0 at r230826.
[Bug tree-optimization/68493] [6 Regression] [graphite] ICE in copy_loop_phi_args
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68493 Sebastian Pop changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Sebastian Pop --- fixed in r230772.
[Bug middle-end/68279] ICE: in create_pw_aff_from_tree, at graphite-sese-to-poly.c:836
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68279 Sebastian Pop changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #7 from Sebastian Pop --- Fixed in r230771
[Bug middle-end/68314] [6 Regression] Invalid read in build_pbb_minimal_scattering_polyhedrons (graphite-sese-to-poly.c:148)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68314 --- Comment #2 from Sebastian Pop --- This patch exposes the problem without valgrind: diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index 2054fad..b932dae 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -143,6 +143,9 @@ build_pbb_minimal_scattering_polyhedrons (isl_aff *static_sched, poly_bb_p pbb, /* False for loop dimension. */ sequence_and_loop_dims[i + j] = false; } + + gcc_assert (nb_sequence_dim > j); + /* Fake loops make things shifted by one. */ if (sequence_dims && sequence_dims[j] == i) sequence_and_loop_dims[i + j] = true;
[Bug tree-optimization/67984] [GRAPHITE] internal compiler error: isl_ctx freed, but some objects still reference it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67984 Sebastian Pop changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2015-11-23 CC||spop at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Sebastian Pop --- I cannot reproduce the error on GCC 6.0 trunk. Also, please provide a reduced testcase, the attached testcase fails with: In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/immintrin.h:43:0, from /usr/include/CL/cl_platform.h:441, from /usr/include/CL/cl.h:30, from /usr/include/CL/opencl.h:42, from dcttest.c:61: /usr/lib/gcc/x86_64-linux-gnu/5/include/avx2intrin.h: In function ‘_mm256_mpsadbw_epu8’: /usr/lib/gcc/x86_64-linux-gnu/5/include/avx2intrin.h:46:12: error: can’t convert a value of type ‘int’ to vector type ‘__vector(4) long long int’ which has different size
[Bug middle-end/68314] [6 Regression] Invalid read in build_pbb_minimal_scattering_polyhedrons (graphite-sese-to-poly.c:148)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68314 Sebastian Pop changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from Sebastian Pop --- fixed in r230778
[Bug middle-end/68279] ICE: in create_pw_aff_from_tree, at graphite-sese-to-poly.c:836
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68279 --- Comment #5 from Sebastian Pop --- After fixing the graphite fail, I get these warnings from the testcase in comment4: FAIL: gfortran.dg/graphite/pr68279.f90 -O (test for excess errors) Excess errors: /work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:21:19: Warning: Legacy Extension: REAL array index at (1) /work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:22:25: Warning: Legacy Extension: REAL array index at (1) /work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:22:41: Warning: Legacy Extension: REAL array index at (1) /work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:22:29: Warning: Legacy Extension: REAL array index at (1) /work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:22:75: Warning: Legacy Extension: REAL array index at (1) /work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:22:86: Warning: Legacy Extension: REAL array index at (1) /work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:24:27: Warning: Legacy Extension: REAL array index at (1) /work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:24:36: Warning: Legacy Extension: REAL array index at (1) /work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:25:16: Warning: Legacy Extension: REAL array index at (1) /work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:25:34: Warning: Legacy Extension: REAL array index at (1) Is there a flag I can set to avoid these warnings? Thanks!
[Bug tree-optimization/68493] [6 Regression] [graphite] ICE in copy_loop_phi_args
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68493 --- Comment #1 from Sebastian Pop --- Passes on ISL 0.14, fails with 0.15. This patch fixes it: we will bootstrap and commit. diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index 30c3a21..2783ac4 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -2760,6 +2760,8 @@ translate_isl_ast_to_gimple::translate_pending_phi_nodes () fprintf (dump_file, "[codegen] to new-phi: "); print_gimple_stmt (dump_file, new_phi, 0, 0); } + if (codegen_error) + return; } }
[Bug middle-end/68314] [6 Regression] Invalid read in build_pbb_minimal_scattering_polyhedrons (graphite-sese-to-poly.c:148)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68314 Sebastian Pop changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |spop at gcc dot gnu.org --- Comment #1 from Sebastian Pop --- Confirmed with ISL 0.15. I'm looking at it.
[Bug tree-optimization/68453] [6 Regression] graphite ICE: segfault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68453 Sebastian Pop changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |spop at gcc dot gnu.org --- Comment #2 from Sebastian Pop --- Confirmed.
[Bug tree-optimization/68335] [6 Regression][GRAPHITE] ICE: tree check: expected ssa_name, have real_cst in add_phi_arg_for_new_expr, at sese.c:1373
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68335 --- Comment #4 from Sebastian Pop --- testcase added in r230630
[Bug tree-optimization/68428] [6 Regression] [graphite] ICE in outermost_loop_in_sese w/ -O2 -floop-strip-mine or -O2 -floop-nest-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68428 Sebastian Pop changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||spop at gcc dot gnu.org Resolution|--- |FIXED --- Comment #1 from Sebastian Pop --- Fixed in r230632
[Bug tree-optimization/68335] [6 Regression][GRAPHITE] ICE: tree check: expected ssa_name, have real_cst in add_phi_arg_for_new_expr, at sese.c:1373
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68335 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Sebastian Pop --- This is fixed in trunk as of today. I will add the testcase.
[Bug tree-optimization/68341] [6 Regression] FAIL: gcc.dg/graphite/interchange-{1,11,13}.c (internal compiler error)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68341 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #4 from Sebastian Pop --- Fixed in r230631
[Bug tree-optimization/63602] [4.9/5 Regression] [graphite] Wrong code w/ -O2 -ftree-loop-nest-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63602 Sebastian Pop changed: What|Removed |Added Known to work||6.0 Summary|[4.9/5/6 Regression] Wrong |[4.9/5 Regression] |code w/ -O2 |[graphite] Wrong code w/ |-ftree-loop-nest-optimize |-O2 ||-ftree-loop-nest-optimize Known to fail|6.0 | --- Comment #5 from Sebastian Pop --- gcc 6.0 trunk does not go out of SSA anymore: we rewrote graphite's code gen and added all scalar dependences crossing basic blocks to the dependence graph.
[Bug tree-optimization/68398] New: coremark regression due to r229685
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68398 Bug ID: 68398 Summary: coremark regression due to r229685 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: spop at gcc dot gnu.org Target Milestone: --- We have seen a performance regression due to r229685. We see fewer FSM jump threads on the reduced testcase. CC=2015-11-02-23-23-28-d3063db-trunk/bin/gcc $CC -O3 m.c -fdump-tree-dom1-details=a -o a.out CC=2015-11-02-23-25-06-f497d67-trunk/bin/gcc $CC -O3 m.c -fdump-tree-dom1-details=b -o b.out $ grep FSM a | wc -l 17 $ grep FSM b | wc -l 15 on x86_64 valgrind indicates that with the patch we have 2.5% more instructions executed: + valgrind --dsymutil=yes --tool=callgrind --callgrind-out-file=a.call ./a.out ==27524== Callgrind, a call-graph generating cache profiler ==27524== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et al. ==27524== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info ==27524== Command: ./a.out ==27524== ==27524== For interactive control, run 'callgrind_control -h'. ==27524== ==27524== Events: Ir ==27524== Collected : 209839882 ==27524== ==27524== I refs: 209,839,882 + valgrind --dsymutil=yes --tool=callgrind --callgrind-out-file=b.call ./b.out ==27585== Callgrind, a call-graph generating cache profiler ==27585== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et al. ==27585== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info ==27585== Command: ./b.out ==27585== ==27585== For interactive control, run 'callgrind_control -h'. ==27585== ==27585== Events: Ir ==27585== Collected : 213154557 ==27585== ==27585== I refs: 213,154,557 + callgrind_annotate a.call Profile data file 'a.call' (creator: callgrind-3.10.0.SVN) I1 cache: D1 cache: LL cache: Timerange: Basic block 0 - 46055772 Trigger: Program termination Profiled target: ./a.out (PID 27524, part 1) Events recorded: Ir Events shown: Ir Event sort order: Ir Thresholds: 99 Include dirs: User annotated: Auto-annotation: off Ir 209,839,882 PROGRAM TOTALS Ir file:function 138,250,035 ???:core_bench_list [a.out] 69,160,889 ???:core_list_mergesort.constprop.2 [a.out] 2,309,860 ???:core_list_init [a.out] + callgrind_annotate b.call Profile data file 'b.call' (creator: callgrind-3.10.0.SVN) I1 cache: D1 cache: LL cache: Timerange: Basic block 0 - 48409229 Trigger: Program termination Profiled target: ./b.out (PID 27585, part 1) Events recorded: Ir Events shown: Ir Event sort order: Ir Thresholds: 99 Include dirs: User annotated: Auto-annotation: off Ir 213,154,557 PROGRAM TOTALS Ir file:function 138,845,638 ???:core_bench_list [b.out] 71,879,961 ???:core_list_mergesort.constprop.2 [b.out] 2,309,860 ???:core_list_init [b.out] $ cat m.c typedef struct list_data_s { short data16; short idx; } list_data; typedef struct list_head_s { struct list_head_s *next; struct list_data_s *info; } list_head; list_head *core_list_find(list_head *list,list_data *info); list_head *core_list_reverse(list_head *list); list_head *core_list_remove(list_head *item); list_head *core_list_undo_remove(list_head *item_removed, list_head *item_modified); list_head *core_list_insert_new(list_head *insert_point , list_data *info, list_head **memblock, list_data **datablock , list_head *memblock_end, list_data *datablock_end); typedef int(*list_cmp)(list_data *a, list_data *b); list_head *core_list_mergesort(list_head *list, list_cmp cmp); short state_scores[4] = {-29126, 24894, -24736, -272}; short matrix_scores[4] = {8151, -30381, -32453, 11169}; unsigned state_idx = 0, matrix_idx = 0; short calc_func(short *pdata
[Bug tree-optimization/68343] FAIL: gcc.dg/graphite/fuse-{1,2}.c scan-tree-dumps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68343 Sebastian Pop changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |spop at gcc dot gnu.org --- Comment #5 from Sebastian Pop --- You need ISL 0.15 to have these tests pass. Could you please report which ISL version you configured gcc with? I will try to get a check in the graphite.exp to only select fuse-* files when configured with ISL 0.15 or later.
[Bug middle-end/68279] ICE: in create_pw_aff_from_tree, at graphite-sese-to-poly.c:836
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68279 Sebastian Pop changed: What|Removed |Added CC||spop at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |spop at gcc dot gnu.org --- Comment #2 from Sebastian Pop --- I'll have a look.
[Bug tree-optimization/66070] [GRAPHITE] cc1 gets killed by OOM killer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66070 --- Comment #4 from Sebastian Pop --- r227572
[Bug tree-optimization/62113] [graphite] ICE using -floop-parallelize-all
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62113 Sebastian Pop changed: What|Removed |Added Status|NEW |RESOLVED CC||spop at gcc dot gnu.org Resolution|--- |FIXED --- Comment #3 from Sebastian Pop --- Fixed on trunk with a recent ISL-0.15 that contains the compute time out functions. $ time gcc -O2 -floop-parallelize-all -c rdft.i real 0m1.763s
[Bug middle-end/47598] -fgraphite-identity at -O2 breaks profiledbootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47598 Sebastian Pop changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #7 from Sebastian Pop --- Just completed a "make profiledbootstrap" on trunk with BOOT_CFLAGS="-g -O2 -fgraphite-identity -floop-nest-optimize" on an x86_64-linux machine.