[Bug target/109519] aarch64: wrong code with NEON intrinsics on gcc-10 and later

2023-04-15 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109519

--- Comment #5 from Sebastian Pop  ---
Thanks Andrew for the patch, it fixes the issue.

[Bug target/109519] New: aarch64: wrong code with NEON intrinsics on gcc-10 and later

2023-04-14 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109519

Bug ID: 109519
   Summary: aarch64: wrong code with NEON intrinsics on gcc-10 and
later
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: spop at gcc dot gnu.org
  Target Milestone: ---

Steps to reproduce:
$ git clone https://github.com/sebpop/bitshuffle.git -b gcc-10-bug
$ cd bitshuffle/reproduce
$ make
$ ./a.out

The expected output is produced by gcc-7, gcc-9, and clang-15. 
16384
4
14
16
33
39
45
51
57
67
102
108
120
126
128
134
138
140
[...]

gcc-9 is the last version of gcc I tested that works.

gcc-10 produces the following output:
./a.out
16384
0
0
0
0
39
45
51
57

gcc-11 and gcc-trunk produce the following output:
./a.out
16384
0
0
0
0
0
0
0

The output is also correct when removing the before-last patch from the git
repo https://github.com/kiyo-masui/bitshuffle/pull/140 
This patch exposes the bug in gcc by using NEON intrinsics instead of scalar
computations to translate move_mask instructions from SSE2 to NEON.

[Bug tree-optimization/107409] Perf loss ~5% on 519.lbm_r SPEC cpu2017 benchmark with r10-5090-ga9a4edf0e71bba

2023-02-02 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107409

Sebastian Pop  changed:

   What|Removed |Added

 CC||spop at gcc dot gnu.org

--- Comment #18 from Sebastian Pop  ---
A new 5% regression happened in gcc-trunk more recently and may be due to
another patch.

Rama was bisecting a 15% perf regression on lbm when updating gcc-7 to gcc-10.
The regression can be seen on the LNT graph link from comment#3 

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=633.477.0=683.477.0=664.477.0=648.477.0=618.477.0=605.477.0=759.477.0=584.477.0

gcc-6 has execution time of 213 seconds
gcc-7 is at 215 seconds
gcc-8 is at 266
gcc-9 at 259
gcc-10 at 260

Honza's patch seems to be unrelated as it was committed to trunk before gcc-10
release on May 7, 2020:

commit a9a4edf0e71bbac9f1b5dcecdcf9250111d16889
Author: Jan Hubicka 
Date:   Sat Nov 30 22:25:24 2019 +0100

Update max_bb_count in execute_fixup_cfg


We need to git-bisect between gcc-7 and gcc-8.

[Bug debug/98776] DW_AT_low_pc is inconsistent with function entry address, when enabling -fpatchable-function-entry

2022-12-15 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Sebastian Pop  ---
Fixed for arm64 as well on master, and backported to active branches gcc-12,
11, and 10.

[Bug debug/98776] DW_AT_low_pc is inconsistent with function entry address, when enabling -fpatchable-function-entry

2022-11-30 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776

--- Comment #10 from Sebastian Pop  ---
Patch for arm64:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607601.html

[Bug middle-end/107485] [10 Regression] gcc-10 ICE with -fnon-call-exception

2022-11-14 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107485

--- Comment #10 from Sebastian Pop  ---
Thanks Richard.
The patch fixed the larger test as well.

[Bug middle-end/107485] New: gcc-10 ICE with -fnon-call-exception

2022-10-31 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107485

Bug ID: 107485
   Summary: gcc-10 ICE with -fnon-call-exception
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: spop at gcc dot gnu.org
  Target Milestone: ---

On arm64-linux I see the following crash only on gcc-10.
I do not see the ICE on gcc-11, 12, and trunk. 

$ ~/gcc-10/bld/gcc/cc1plus -fnon-call-exceptions f.ii
[...]
f.ii:29:23: internal compiler error: Segmentation fault
   29 |   template  void x(double *, b, unsigned long *) { f(); }
  |   ^
0x134e58b crash_signal
../../gcc/toplev.c:328
0x1639464 tree_vec_extract(gimple_stmt_iterator*, tree_node*, tree_node*,
tree_node*, tree_node*)
../../gcc/tree-vect-generic.c:140
0x163ca0f expand_vector_condition
../../gcc/tree-vect-generic.c:1044
0x164081f expand_vector_operations_1
../../gcc/tree-vect-generic.c:1988
0x16419f7 expand_vector_operations
../../gcc/tree-vect-generic.c:2240
0x1641b3f execute
../../gcc/tree-vect-generic.c:2284
[...]

$ cat f.ii
typedef long a;
typedef double b;
typedef struct {
  a c __attribute__((__vector_size__(32)));
  b d __attribute__((__vector_size__(32)));
} e;
__attribute__((__always_inline__)) b f() {
  e g, h, i;
  g.c = h.d < i.d;
}
class j {
  bool k();
};
template  void ab(aa, l, n) {
  int o;
  typename n::p q;
  unsigned long r;
  q(0, o, );
}
namespace s {
template 
void t(j *, long, long, unsigned long *, int u) {
  n ac;
  void v();
  ab(v, u, ac);
}
} // namespace s
struct w {
  template  void x(double *, b, unsigned long *) { f(); }
  double ad;
  void operator()(double, double, unsigned long *) {
unsigned long m;
x<0>(, 0, );
  }
};
using s::t;
struct y {
  using p = w;
};
long ag, ah;
unsigned long ai;
double aj;
bool j::k() {
  using n = y;
  t(this, ag, ah, , aj);
}



git bisect stops on this patch:

commit 1e676cfbe1e13fba2c636b560362ed4f0a56893d
Author: Richard Biener 
Date:   Mon May 18 08:51:23 2020 +0200

middle-end/95171 - inlining of trapping compare into non-call EH fn

This fixes always-inlining across -fnon-call-exception boundaries
for conditions which we do not allow to throw.

2020-05-18  Richard Biener  

PR middle-end/95171
* tree-inline.c (remap_gimple_stmt): Split out trapping compares
when inlining into a non-call EH function.

* gcc.dg/pr95171.c: New testcase.

(cherry picked from commit fe168751c5c1c517c7c89c9a1e4e561d66b24663)

[Bug debug/98776] DW_AT_low_pc is inconsistent with function entry address, when enabling -fpatchable-function-entry

2022-09-29 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776

Sebastian Pop  changed:

   What|Removed |Added

 CC||spop at gcc dot gnu.org

--- Comment #9 from Sebastian Pop  ---
Hi, is somebody working on fixing this on arm64?  If not I will be working on
it.

The linux kernel needs this fixed for systemtap and perf probe.

[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins

2022-05-16 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162

Sebastian Pop  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #14 from Sebastian Pop  ---
Fixed.

[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins

2022-04-18 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162

Sebastian Pop  changed:

   What|Removed |Added

  Attachment #52762|0   |1
is obsolete||

--- Comment #8 from Sebastian Pop  ---
Created attachment 52826
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52826=edit
patch

You are right.  Please see attached an amended patch that only adds the
barriers to __sync builtins.

[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins

2022-04-06 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162

Sebastian Pop  changed:

   What|Removed |Added

  Attachment #52755|0   |1
is obsolete||

--- Comment #5 from Sebastian Pop  ---
Created attachment 52762
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52762=edit
patch

The attached patch fixes the issue for __sync builtins by adding the missing
barrier to -march=armv8-a+nolse path in the outline-atomics functions.

The patch also changes the behavior of __atomic builtins for -moutline-atomics
-march=armv8-a+nolse to be the same as for -march=armv8-a+lse.

[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins

2022-04-06 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162

--- Comment #4 from Sebastian Pop  ---
The attached patch degrades performance on cpus with LSE: the barrier is not
needed when outline-atomics execute an LSE instruction.

I was thinking to add the barrier to the armv8.0 generic path (no LSE) in the
outline-atomics functions.

[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins

2022-04-05 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162

Sebastian Pop  changed:

   What|Removed |Added

  Attachment #52750|0   |1
is obsolete||

--- Comment #3 from Sebastian Pop  ---
Created attachment 52755
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52755=edit
patch

LSE atomics do not need a barrier.

Updated the patch to only generate the barriers after outline-atomics calls.

[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins

2022-04-05 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162

--- Comment #2 from Sebastian Pop  ---
Created attachment 52750
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52750=edit
patch

Fix.

[Bug target/105162] [AArch64] outline-atomics drops dmb ish barrier on __sync builtins

2022-04-05 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162

--- Comment #1 from Sebastian Pop  ---
Also happens when compiling with LSE: -march=armv8.1-a or later.

[Bug target/105162] New: [AArch64] outline-atomics drops dmb ish barrier on __sync builtins

2022-04-05 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162

Bug ID: 105162
   Summary: [AArch64] outline-atomics drops dmb ish barrier on
__sync builtins
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: spop at gcc dot gnu.org
  Target Milestone: ---

With -mno-outline-atomics gcc produces a `dmb ish` barrier on __sync builtins
as required by the Intel specification 
(see fix for https://gcc.gnu.org/PR65697 
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=f70fb3b635f9618c6d2ee3848ba836914f7951c2
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=ab876106eb689947cdd8203f8ecc6e8ac38bf5ba
)

$ cat a.c
int foo(int a)
{
  return __sync_bool_compare_and_swap(, 4, 5);
}
$ gcc -O2 a.c -S -o- -mno-outline-atomics 
foo:
sub sp, sp, #16
mov w1, 5
str w0, [sp, 12]
add x0, sp, 12
.L4:
ldxrw2, [x0]
cmp w2, 4
bne .L5
stlxr   w3, w1, [x0]
cbnzw3, .L4
.L5:
dmb ish
csetw0, eq
add sp, sp, 16
ret

With -moutline-atomics gcc does not generate the barrier:

$ gcc -O2 a.c -S -o-  -moutline-atomics 
foo:
stp x29, x30, [sp, -32]!
mov w1, 5
mov x29, sp
add x2, sp, 28
str w0, [sp, 28]
mov w0, 4
bl  __aarch64_cas4_acq_rel
cmp w0, 4
csetw0, eq
ldp x29, x30, [sp], 32
ret

Happens on gcc-8, 9, 10, 11, and trunk.

[Bug rtl-optimization/99346] New: [aarch64] ICE in gen_rtx_SUBREG, at emit-rtl.c:1021

2021-03-02 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99346

Bug ID: 99346
   Summary: [aarch64] ICE in gen_rtx_SUBREG, at emit-rtl.c:1021
   Product: gcc
   Version: 8.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: spop at gcc dot gnu.org
  Target Milestone: ---

Created attachment 50289
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50289=edit
pre-processed reduced testcase

gcc-8, gcc-9, and gcc-10 from Ubuntu 20.04 are failing to compile the attached
test at -O2 and -O3 on Graviton2 aarch64-linux.

$ g++-10 -O2 a.ii
[...]
a.ii:362:50: internal compiler error: in gen_rtx_SUBREG, at emit-rtl.c:1021


$ g++-8 -O2 a.ii
[...]
a.ii:493:11: internal compiler error: in gen_rtx_SUBREG, at emit-rtl.c:1010

Similar bug was reported/fixed on x86:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83723

[Bug c++/99012] gcc-8.4.0 on aarch64 hits internal error during RTL pass: expand if `std::copysign` is used

2021-02-08 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99012

--- Comment #3 from Sebastian Pop  ---
I do not see the bug with today's cc1plus from origin/releases/gcc-8

[Bug c++/99012] gcc-8.4.0 on aarch64 hits internal error during RTL pass: expand if `std::copysign` is used

2021-02-08 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99012

Sebastian Pop  changed:

   What|Removed |Added

 CC||spop at gcc dot gnu.org

--- Comment #2 from Sebastian Pop  ---
I see the bug with

$ gcc-8 --version
gcc-8 (Ubuntu/Linaro 8.4.0-1ubuntu1~18.04) 8.4.0

[Bug target/98877] New: [AArch64] Inefficient code generated for tbl NEON intrinsics

2021-01-28 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877

Bug ID: 98877
   Summary: [AArch64] Inefficient code generated for tbl NEON
intrinsics
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: spop at gcc dot gnu.org
  Target Milestone: ---

The use of NEON intrinsics is inefficient and leads developers to prefer inline
assembly instead of intrinsics.

A similar performance bug for vmlal intrinsics was reported in
https://gcc.gnu.org/PR92665
The code generated by GCC for table lookups is also inefficient:

$ cat red.c
#include "arm_neon.h"

uint8x16_t fun(uint8x16_t lo, uint8x16_t hi, uint8x16_t idx) {
  uint8x16x2_t tab = { .val = {lo, hi} };
  uint8x16_t res = vqtbl2q_u8(tab, idx);
  return res;
}

$ gcc -O3 -S -o- red.c
fun:
mov v4.16b, v0.16b
mov v5.16b, v1.16b
tbl v0.16b, {v4.16b - v5.16b}, v2.16b
ret

$ clang -O3 -S -o- red.c
fun:
tbl v0.16b, { v0.16b, v1.16b }, v2.16b
ret

[Bug target/97802] New: [AArch64] Incorrect documentation for Arm64 NEON

2020-11-11 Thread spop at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97802

Bug ID: 97802
   Summary: [AArch64] Incorrect documentation for Arm64 NEON
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: spop at gcc dot gnu.org
  Target Milestone: ---

The following text in doc/invoke.texi seems to be outdated.  To avoid confusion
the text needs to be more specific on which NEON implementations it applies:

"If the selected floating-point hardware includes the NEON extension
(e.g.@: @option{-mfpu=neon}), note that floating-point
operations are not generated by GCC's auto-vectorization pass unless
@option{-funsafe-math-optimizations} is also specified.  This is
because NEON hardware does not fully implement the IEEE 754 standard for
floating-point arithmetic (in particular denormal values are treated as
zero), so the use of NEON instructions may lead to a loss of precision."

This used to be true for older NEON implementations.
NEON implementation in Armv8 and later is IEEE 754 compliant.

[Bug target/92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

2020-03-31 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

--- Comment #7 from Sebastian Pop  ---
Hi Andrew, have you committed the fix for this?

[Bug target/92692] Saving off the callee saved register between ldxr/stxr (caused by shrink wrapping improvements)

2020-02-28 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92692

--- Comment #23 from Sebastian Pop  ---
> I don't see anything like that on the gcc-9 branch - are you sure you don't 
> have an outstanding change somehow?

You are right, a part of the -moutline-atomics patch that I am working on
backporting to branch 9 added that change.

[Bug target/92692] Saving off the callee saved register between ldxr/stxr (caused by shrink wrapping improvements)

2020-02-27 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92692

Sebastian Pop  changed:

   What|Removed |Added

 CC||spop at gcc dot gnu.org

--- Comment #21 from Sebastian Pop  ---
It looks like this hunk from the trunk version of the patch is missing on gcc-9
branch:

diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index cabcc58f1a0..1458bc00095 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -104,7 +104,7 @@
(clobber (match_scratch:SI 7 "="))]
   ""
   "#"
-  "&& reload_completed"
+  "&& epilogue_completed"
   [(const_int 0)]
   {
 aarch64_split_compare_and_swap (operands);



With this hunk applied my bootstrap passes on the gcc-9 branch on an
aarch64-linux graviton2.

Without this hunk I see an error in thread sanitizers.

I also have checked gcc-8 release branch and it seems that the patch is not
missing any hunks in that branch.

Could somebody apply the missing hunk to the gcc-9 release branch?  Thanks!

[Bug rtl-optimization/92665] New: [AArch64] low lanes select not optimized out for vmlal intrinsics

2019-11-25 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

Bug ID: 92665
   Summary: [AArch64] low lanes select not optimized out for vmlal
intrinsics
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: spop at gcc dot gnu.org
  Target Milestone: ---

With gcc as of today I see dup instructions that could be optimized out:

$ cat red.c
#include "arm_neon.h"

int32x4_t fun(int32x4_t a, int16x8_t b, int16x8_t c) {
  a = vmlal_s16(a, vget_low_s16(b), vget_low_s16(c));
  a = vmlal_high_s16(a, b, c);
  return a;
}

$ gcc -O3 -S -o- red.c
fun:
dup d3, v1.d[0]
dup d4, v2.d[0]
smlal v0.4s,v3.4h,v4.4h
smlal2 v0.4s,v1.8h,v2.8h
ret

$ clang -O3 -S -o- red.c
fun:
smlal   v0.4s, v1.4h, v2.4h
smlal2  v0.4s, v1.8h, v2.8h
ret

[Bug tree-optimization/86865] [9 Regression] Wrong code w/ -O2 -floop-parallelize-all -fstack-reuse=none -fwrapv -fno-tree-ch -fno-tree-dce -fno-tree-dominator-opts -fno-tree-loop-ivcanon

2019-01-24 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86865

Sebastian Pop  changed:

   What|Removed |Added

 CC||spop at gcc dot gnu.org

--- Comment #7 from Sebastian Pop  ---
I think the patch is ok.

If in the future we want to handle those other loops, we will need to compute
the loop bound in add_loop_constraints() with a check for whether the stmt is
dominated by the exit or not.

Here is what we do today for all stmts in the loop:

  tree nb_iters = number_of_latch_executions (loop);
  if (TREE_CODE (nb_iters) == INTEGER_CST)
{
  /* loop_i <= cst_nb_iters */

the constraint '<=' on statements' iteration domains implies that the loop
should be under a do-while form.

[Bug tree-optimization/87917] ICE in initialize_matrix_A at gcc/tree-data-ref.c:3150

2018-11-08 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87917

Sebastian Pop  changed:

   What|Removed |Added

 CC||spop at gcc dot gnu.org

--- Comment #3 from Sebastian Pop  ---
> Sebastian - can you say if
> evolution_function_is_affine_multivariate_p ({0, +, {0, +, 4}_1}_2, 1)
> should really return true?

You are right, {0, +, {0, +, 4}_1}_2 is not a valid affine multivariate
function: only the base (not the step) should vary in an outer loop.

For example, this would be an affine multivariate: {{0, +, 4}_1, +, 42}_2.

[Bug tree-optimization/82449] code-gen error in get_rename_from_scev

2017-10-06 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82449

--- Comment #2 from Sebastian Pop  ---
This part is not affine: {0, +, {1, +, 1}_1}_1
This is a polynomial of degree 2.
Are you sure the scev analysis reports this as affine?

I was trying to understand from the fortran code which part this scev comes
from...
and I think it comes from the NKL counter that gets incremented in the inner
loop,
counting the number of iterations of both loops, so it has a quadratic
evolution.

[Bug tree-optimization/69728] [6/7 Regression] internal compiler error: in outer_projection_mupa, at graphite-sese-to-poly.c:1175

2017-09-20 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69728

--- Comment #22 from Sebastian Pop  ---
> I put it on my TODO to figure out how to "DCE" a stmt
> (or in this case it's rather the whole "loop body", right?).

The code generator would not even see a statement to be generated: it would
just disappear in the new code, so there is nothing to do to DCE statements
with empty domains.

> I've not fully found my way through initial schedule building yet
> (otherwise I would have tried refactoring to not operate in pbb
> vector order but more naturally follow the SESE in a CFG walk with
> maintaining a BB -> pbb mapping).

Yes, DOM-walk could be used to detect the sequential order in which basic
blocks are executed.

There are some difficulties in giving an execution sequence number for if-then
and if-else clauses, and for switch cases: for the moment we represent them as
executing in sequence.  For example,

if (c)
  a;
else
  b;

we would number the stmts a and b as if the code looked like this:

if (c)
  a;
if (!c)
  b;

which is correct.
The fact that the constraint "c" is added to the iteration domain of "a", and
"!c" added to the iter domain of "b" allows the scheduler to know that there
are no sequential dependences between stmts "a" and "b" as they are executed in
different iterations.

[Bug tree-optimization/81373] [7/8 Regression] Graphite ICE in ssa_default_def at gcc/tree-dfa.c:305

2017-09-19 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81373

--- Comment #4 from Sebastian Pop  ---
The patch looks good.  Thanks!

[Bug tree-optimization/79622] [6/7 Regression] Wrong code w/ -O2 -floop-nest-optimize

2017-09-19 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79622

--- Comment #10 from Sebastian Pop  ---
> So a black-box would be a set of stmts rather than a whole GIMPLE BB

Correct: this can be an abstract view of the IR.  The only place where we want
to start transforming the code is in the code generation.  We should be able to
interrupt graphite at any point (maybe due to a compute-out) and leave the
original unmodified IR.  Code generation should not fail and it should be
linear time in number of statements, such that when we start code generation we
know that it will succeed in a short amount of compilation time.

> You mean this tagging of associativeness is not yet done?

Yes, we removed the tagging code when we removed the out-of-ssa translation.
The original tagging relied on the name of the arrays that we created to find
whether the reduction was associative.  This caused some performance
regressions of loops not interchanged anymore (for example the swim loop.)

[Bug tree-optimization/69728] [6/7/8 Regression] internal compiler error: in outer_projection_mupa, at graphite-sese-to-poly.c:1175

2017-09-19 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69728

--- Comment #19 from Sebastian Pop  ---
> So how'd we properly handle a valid empty domain?

DCE the statement.

If the domain for a statement is empty, it means that the statement does not
execute: it is dead code.

I think we are better enforcing the elimination of the statement as this wrong
analysis (or translation) of the number of iterations could produce wrong code.

> I assume P_21 is c.7_12

the number after P_ is the ssa variable number, so P_21 is c.7_21.

> we have 0 <= i1 <= 2147483637, whereever that comes from.

you can think about i1 as a canonical induction variable: 0 <= i1
and i1 is indexing all iterations in that loop: i.e., i1 is incremented by 1.

> Probably from the i1 <= 2147483637 constraint

this constraint is added based on the type of the induction variable that gives
an upper bound for the iteration domain.

> 4294967296*floor((-1 - P_21)/4294967296) < -P_21 - i1

Yes, this constraint seems to be wrong.

[Bug tree-optimization/79622] [6/7 Regression] Wrong code w/ -O2 -floop-nest-optimize

2017-09-18 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79622

--- Comment #8 from Sebastian Pop  ---
> I would have expected at least each memory op to be in a separate "black box"

We could have a pass before graphite that splits BBs with more than one write
into blocks that contain one data write with all the operations and data reads
needed to compute the stored value.  This would allow more freedom to schedule
BBs around.

> if you follow the original go-out-of-SSA approach you'd have their effects
> on the CFG edges.  So a more complete fix would similarly handle uses.

In other words: how do we handle reductions?
As you remember, the original way was to expose reductions by rewriting
out-of-SSA
scalar dependences crossing basic blocks (loop-phi nodes, loop-close-phi
nodes,)
tagging the properties of the reduction (commutative, associative)
on the array, and adding that info to the data dependence graph.
By adding those properties to the dependence graph, we give the scheduler
more freedom to select transforms.

We moved away from rewriting scalar dependences out-of-SSA because we do not
want to transform the code if the scheduler has no better transform to be done:
we do not want to leave around inefficient memory reads/writes.
Instead, we handle SSA names and create scalar references added to the
dependence graph.  We still need to tag scalar reductions with their
associative properties to allow the scheduler to reorder the computations.

[Bug tree-optimization/69728] [6/7/8 Regression] internal compiler error: in outer_projection_mupa, at graphite-sese-to-poly.c:1175

2017-09-18 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69728

--- Comment #15 from Sebastian Pop  ---
It makes sense to early fail when the schedule builder gets confused and built
an empty domain.  Could you please also add a comment around the if that sets
schedule_error?  The change looks good.  Thanks.

[Bug tree-optimization/79622] [6/7/8 Regression] Wrong code w/ -O2 -floop-nest-optimize

2017-09-14 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79622

--- Comment #4 from Sebastian Pop  ---
Yes, that phi node looks like a reduction.  We need to handle the phi as a
write to expose the loop carried reduction variable to the dependence analysis.
I think your change goes in the right direction.  Thanks!

[Bug tree-optimization/68823] [6/7/8 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes

2017-09-14 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823

--- Comment #15 from Sebastian Pop  ---
> when DR_NUM_DIMENSIONS (dr1->dr) != DR_NUM_DIMENSIONS (dr2->dr) better "FAIL"?

Yes.
The patch looks good to me.

[Bug ipa/65972] ICE after applying a patch to enable verify_ssa with auto-pgo

2017-04-18 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972

--- Comment #9 from Sebastian Pop  ---
In the link in the previous comment, Richi has a similar patch as suggested by
Dehao pending review/test/commit: let's close this bug when Richi's patch lands
in trunk.

[Bug ipa/65972] ICE after applying a patch to enable verify_ssa with auto-pgo

2017-04-18 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972

--- Comment #8 from Sebastian Pop  ---
Yes please!
This patch also solves the problem I was chasing a week or so ago:

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00067.html

I also know that this is ICE-ing on a large proprietary project when I compile
it with autoFDO on gcc-5.x and 6.x releases.

[Bug driver/79637] missing documentation for PARAM_MAX_FSM_THREAD_LENGTH

2017-04-18 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79637

--- Comment #3 from Sebastian Pop  ---
As to why we call it a "finite state automaton" jump threading, that is because
this transform shows to be useful when the switch statement in the previous
example is contained in a loop, which is the way most people use to implement a
parser, or a finite state machine.  Some of these automata are implementing
state transitions by setting the next state in one of the cases.  To continue
the example from the previous comment, here is how a two state machine looks
like:

c = 1;
while (1)
{
  switch (c)
{
case 1:
  c = 5;
  break;
case 5:
  c = 1;
  break;
}
}

and after jump threading, it would look like this:

c = 1;
label1:
  c = 5;
  goto label2;

label2:
  c = 1;
  goto label1;

which is much faster than having to take the loop back-edge + jump from switch
to case.

[Bug driver/79637] missing documentation for PARAM_MAX_FSM_THREAD_LENGTH

2017-04-18 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79637

--- Comment #2 from Sebastian Pop  ---
Here is what I see in doc/invoke.texi:

@item max-fsm-thread-path-insns
Maximum number of instructions to copy when duplicating blocks on a
finite state automaton jump thread path.  The default is 100.

@item max-fsm-thread-length
Maximum number of basic blocks on a finite state automaton jump thread
path.  The default is 10.

@item max-fsm-thread-paths
Maximum number of new jump thread paths to create for a finite state
automaton.  The default is 50.

I think these parameters are quite technical.  The rule is that all the magic
constants should have a param instead of hard coding them in the code, so they
get exposed to the users of the compiler that way.

Roland, I would have liked to point you to a paper that describes the algorithm
for backwards jump-threading, although we have not wrote one yet.  Jeff, I
think it would be good if I take the time to write that paper, and I will ask
you, James, and Brian to co-sign the paper.

Here is a short description of how the backwards jump-threading works:

We start by looking for a switch or condition statement of the form
"switch(c)". Then, following the SSA definitions backwards from "c" to its
definition, until  a place in the program where the condition "c" is statically
known at compile time. To make the example simple, let's say we reach a
statement that sets "c = 5".  With that information in hand, we create a new
path that starts from the basic block that sets "c = 5" and ends in the target
block of the switch "case 5:".  This is done by duplicating all the basic
blocks on the path from "c = 5" to the target of the now known value of the
condition.
max-fsm-thread-length is the bound on the number of basic blocks on that path,
such that we do not increase too much the code size of the program.

[Bug tree-optimization/69675] [6/7 Regression] [graphite] ICE: verify_ssa failed (definition in block 42 does not dominate use in block 34)

2017-02-09 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69675

--- Comment #10 from Sebastian Pop  ---
(In reply to Richard Biener from comment #9)
> Yeah, seems to be gone with ISL 0.18 here as well... (but with 0.16.1 I can
> still reproduce it).  ISL 0.18 doesn't do anything to the loop.  ISL 0.16.1
> just did some IV transforms it seems:
> 
> [scheduler] original ast:
> for (int c0 = 0; c0 <= -P_14; c0 += 1)
>   for (int c1 = 0; c1 <= 3; c1 += 1) {
> S_5(c0, c1);
> if (c1 <= 2)
>   S_6(c0, c1);
>   }
> 
> [scheduler] AST generated by isl:
> for (int c0 = 0; c0 <= -P_14; c0 += 1)
>   for (int c1 = 3 * c0; c1 <= 3 * c0 + 3; c1 += 1) {
> S_5(c0, -3 * c0 + c1);

I don't know why isl started the inner loop at 3*c0:
In the end we have the identity for the array subscript: 3*c0 - 3*c0 = 0
Could be a bug in the older isl.

> if (3 * c0 + 2 >= c1)
>   S_6(c0, -3 * c0 + c1);
>   }
> 
> and with ISL 0.18 we have
> 
> [scheduler] isl optimized schedule is identical to the original schedule.
> for (int c0 = 0; c0 <= -P_14; c0 += 1)
>   for (int c1 = 0; c1 <= 3; c1 += 1) {
> S_5(c0, c1);
> if (c1 <= 2)
>   S_6(c0, c1);
>   }
> 
> and eventually code generation is not happy with the changed form
> (-fgraphite-identity is fine).
> 
> Sebastian, any comment?  I think we could still for example require current
> ISL
> for GCC 6 (0.18 or maybe 0.17.1).  Or at least drop support for the current
> legacy.

I would like moving away from the older isl versions: newer isl have fewer
bugs, and people also worked on making isl faster.
Moving to a newer isl would allow to also clean up the #ifdef's from the
graphite-*.c files which will make the code easier to read.

[Bug tree-optimization/68823] [6/7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes

2017-02-08 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823

--- Comment #11 from Sebastian Pop  ---
(In reply to Richard Biener from comment #10)
> But then with different number of subscripts (and also likely different
> DR_BASE_OBJECT) you can't do anything with them and have to assume
> dependence.  See initialize_data_dependence_relation:
> 
>   /* If the references do not access the same object, we do not know
>  whether they alias or not.  We do not care about TBAA or alignment
>  info so we can use OEP_ADDRESS_OF to avoid false negatives.
>  But the accesses have to use compatible types as otherwise the
>  built indices would not match.  */
>   if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b),
> OEP_ADDRESS_OF)
>   || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)),
>   TREE_TYPE (DR_BASE_OBJECT (b
> {
>   DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>   return res;
> 
> not sure how you communicate that to ISL of course...  is it what you
> use "alias-sets" for?  To create extra dependence egdes?

alias-sets differ for two arrays with bases that have been proven to be
different.
If they may point to the same thing, they will have the same number.

[Bug tree-optimization/68823] [6/7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes

2017-02-03 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823

--- Comment #9 from Sebastian Pop  ---
/* Determines the base object and the list of indices of memory reference
   DR, analyzed in LOOP and instantiated in loop nest NEST.  */

static void
dr_analyze_indices (struct data_reference *dr, loop_p nest, loop_p loop)

This function initializes the subscripts with their access functions:
  DR_ACCESS_FNS (dr) = access_fns;

The number of subscripts (or "dimensions") is then the length of that array:
#define DR_NUM_DIMENSIONS(DR)  DR_ACCESS_FNS (DR).length ()

[Bug tree-optimization/68823] [6/7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes

2017-02-03 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823

--- Comment #8 from Sebastian Pop  ---
The code in fault is called from pdr_add_memory_accesses()
Maybe the problem is in parsing the gimple MEM[] into a data reference.

[Bug tree-optimization/68823] [6/7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes

2017-02-03 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823

--- Comment #7 from Sebastian Pop  ---
(In reply to Martin Liška from comment #5)
> Created attachment 40662 [details]
> Isolated graphite dump for miscompiled function
> 
> As shown in the dump file, there are dependencies for the problematic stmts:
> 
> Adding must write to depedence graph: pdr_121 (write 
> in gimple stmt: MEM[(Element_t[2] &)_7][0] = _9;
> data accesses: { S_3[i2] -> [2, o1, 0] : 8*floor((o1)/8) = o1 and
> 18446744073709551616*floor((8i2 - o1)/18446744073709551616) = 8i2 - o1 and 0
> <= o1 <= 18446744073709551608 }
> 
> Adding read to depedence graph: pdr_124 (read 
> in gimple stmt: _15 = MEM[(int *)_14];
> data accesses: { S_6[i1] -> [2, o1] : 18446744073709551616*floor((-8i1 +
> o1)/18446744073709551616) = -8i1 + o1 and 0 <= o1 <= 18446744073709551608 }
> 
> If I understand the notation correctly, both have equal alias set (2). Do
> you see Sebastian why the dependence is not caught?
> 

S_3[i2] -> [2, o1, 0]
S_6[i1] -> [2, o1]

we do not detect the dependence because the two arrays do not have the same
number of subscripts: also on the gimple representation we have

MEM[(Element_t[2] &)_7][0] = _9;
vs.
_15 = MEM[(int *)_14];

[Bug tree-optimization/68823] [6/7 Regression][graphite] tramp3d-v4 compiled with -floop-nest-optimize crashes

2017-02-02 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823

--- Comment #4 from Sebastian Pop  ---
The data dependence relations are dumped in the output of
-fdump-tree-graphite-all.
graphite-dependences.c contains the code for the data dependence computations.
Looking at the gimple code it seems like a trivial write after write
dependence.

Do we have a reduced testcase for this problem?

[Bug tree-optimization/77362] [6/7 Regression] [graphite] ICE in sese_build_liveouts_use w/ -O2 -floop-nest-optimize

2017-01-31 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77362

--- Comment #10 from Sebastian Pop  ---
(In reply to Richard Biener from comment #9)
> Yeah, but the user can write such dependences himself so ideally we have
> a way to undo them, like by using local scratch memory?  So

You are right.  LLVM-Polly has a pass that undoes LIM, it is non trivial, and
furthermore we'd better catch the LIM once the loop transforms are done!

> 
>   x_0 = 1;
> 
> loop:
>   # x_1 = PHI 
>   ...
>   x_2 = ...;
>   goto loop;
> 
> turns into
> 
>   mem = 1;
>   
> loop:
>   x_1 = mem;
>   x_2 = ...;
>   mem = x_2;
>   goto loop;
> 
> plus replacement of exit PHIs with loads.  Would that help?

That's how we were handling reductions and end of loop values in the dependence
graph.  Today we can reason about scalars themselves and add the scalars to the
dependence graph instead of generating the loads that would need to be cleaned
up after graphite.

[Bug tree-optimization/77362] [6/7 Regression] [graphite] ICE in sese_build_liveouts_use w/ -O2 -floop-nest-optimize

2017-01-31 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77362

--- Comment #8 from Sebastian Pop  ---
LIM in general is bad for loop transforms: it introduces loop carried
dependences. If we can move graphite before LIM that would solve some problems.

[Bug tree-optimization/77362] [6/7 Regression] [graphite] ICE in sese_build_liveouts_use w/ -O2 -floop-nest-optimize

2017-01-31 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77362

--- Comment #7 from Sebastian Pop  ---
The fix looks good.  Thanks!

[Bug tree-optimization/77605] [5/6/7 Regression] wrong code at -O3 on x86_64-linux-gnu

2016-09-16 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77605

--- Comment #6 from Sebastian Pop  ---
The proposed change looks good to me.

"last_conflicts" is the max index in the conflicting functions for which there
is a dependence:

mem_access_a (conflicting_iterations_in_a (last_conflicts)) is in dependence
with
mem_access_b (conflicting_iterations_in_b (last_conflicts)).

[Bug tree-optimization/70956] ICE in build_cross_bb_scalars_def, at graphite-scop-detection.c:1725

2016-05-05 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70956

--- Comment #2 from Sebastian Pop  ---
The change looks good to me.

[Bug middle-end/70159] missed CSE optimization

2016-03-10 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159

--- Comment #9 from Sebastian Pop  ---
Created attachment 37927
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37927=edit
patch for hoisting expressions

Updated the patch from PR23286 to hoist the redundant expressions:

  :
  inv_4 = 1.0e+0 / d_3(D);
  _18 = min_5(D) - a_6(D);
  _19 = _18 / inv_4;
  _20 = max_9(D) - a_6(D);
  _21 = _20 / inv_4;
  if (inv_4 >= 0.0)
goto ;
  else
goto ;

  :

  :
  # tmin_1 = PHI <_19(2), _21(3)>
  # tmax_2 = PHI <_21(2), _19(3)>
  _16 = tmin_1 + tmax_2;
  return _16;

The attached patch does not pass make check and causes some infinite recursion.

[Bug middle-end/70159] missed CSE optimization

2016-03-09 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159

--- Comment #7 from Sebastian Pop  ---
(In reply to Andrew Pinski from comment #6)
> Note this is both a hoisting and a sinking issue.
> Hoisting should happen before sinking.
> LLVM looks like it only implements sinking.

You are right: LLVM does sinking very early as part of instcombine: it
transforms the phi nodes after the if into selects over the operands and sinks
the sub and mul after the select.  By the time other redundancy elimination
passes are executed the shape of the code is more difficult to optimize.

[Bug middle-end/70159] missed CSE optimization

2016-03-09 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159

--- Comment #2 from Sebastian Pop  ---
Right, with -Ofast it be able to optimize away the branch or selects.
The original benchmark had something more complex than fadd to use the tmin and
tmax results. Here is one more test using the results in a non commutative
operation:

bool foo_p(float d, float min, float max, float a)
{
  float tmin;
  float tmax;

  float inv = 1.0f / d;
  if (inv >= 0) {
tmin = (min - a) * inv;
tmax = (max - a) * inv;
  } else {
tmin = (max - a) * inv;
tmax = (min - a) * inv;
  }

  return tmax > tmin;
}

[Bug middle-end/70159] New: missed CSE optimization

2016-03-09 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159

Bug ID: 70159
   Summary: missed CSE optimization
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: spop at gcc dot gnu.org
  Target Milestone: ---

$ cat h.c
float foo_p(float d, float min, float max, float a)
{
  float tmin;
  float tmax;

  float inv = 1.0f / d;
  if (inv >= 0) {
tmin = (min - a) * inv;
tmax = (max - a) * inv;
  } else {
tmin = (max - a) * inv;
tmax = (min - a) * inv;
  }

  return tmax + tmin;
}

$ gcc h.c -Ofast -S -o- 
foo_p:
fmovs4, 1.0e+0
fdivs0, s4, s0
fcmpe   s0, #0.0
blt .L6
fsubs1, s1, s3
fsubs2, s2, s3
fmuls1, s1, s0
fmuls0, s2, s0
fadds0, s1, s0
ret
.p2align 3
.L6:
fsubs4, s2, s3
fsubs2, s1, s3
fmuls1, s4, s0
fmuls0, s2, s0
fadds0, s1, s0
ret

$ clang h.c -Ofast -S -o-
foo_p:  // @foo_p
// BB#0:// %entry
fmovs4, #1.
fdivs0, s4, s0
fcmps0, #0.0
fcsel   s4, s1, s2, lt
fcsel   s1, s2, s1, lt
fsubs1, s1, s3
fsubs2, s4, s3
fadds1, s2, s1
fmuls0, s1, s0
ret

The computations in both branches are redundant.
Even without if-conversion (fcsel), GCC should be able to sink/hoist fsub and
fmul.

[Bug middle-end/69545] [6 Regression] FAIL: gfortran.dg/graphite/pr42285.f90 -O (internal compiler error)

2016-01-28 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69545

Sebastian Pop  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |spop at gcc dot gnu.org

--- Comment #1 from Sebastian Pop  ---
I guess this issue is due to isl-0.14.  With isl-0.15 it is passing.
I will have a look.

[Bug middle-end/69545] [6 Regression] FAIL: gfortran.dg/graphite/pr42285.f90 -O (internal compiler error)

2016-01-28 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69545

Sebastian Pop  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Sebastian Pop  ---
Fixed in r232966 by reverting r232939.

[Bug tree-optimization/68343] FAIL: gcc.dg/graphite/fuse-{1,2}.c scan-tree-dumps

2016-01-25 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68343

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Sebastian Pop  ---
fixed in r232811.

[Bug tree-optimization/68398] [6 Regression] coremark regression due to r229685

2016-01-23 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68398

--- Comment #4 from Sebastian Pop  ---
Thanks Jeff for looking into this issue.
I was thinking about a heuristic as you mentioned in comment #2:
what about allowing creation of irreducible loops, multiple latches, etc. after
the loop optimizers are done?

[Bug tree-optimization/69341] [6 Regression] [graphite] ICE: verify_ssa failed (error: definition in block 37 does not dominate use in block 30)

2016-01-22 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69341
Bug 69341 depends on bug 68692, which changed state.

Bug 68692 Summary: [6 Regression][graphite] ice: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68692

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/68692] [6 Regression][graphite] ice: Segmentation fault

2016-01-22 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68692

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Sebastian Pop  ---
fixed in r232659.

[Bug tree-optimization/69292] [6 Regression][graphite] ICE with -floop-nest-optimize

2016-01-22 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69292

Sebastian Pop  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #6 from Sebastian Pop  ---
fixed at r232659

*** This bug has been marked as a duplicate of bug 68692 ***

[Bug tree-optimization/68692] [6 Regression][graphite] ice: Segmentation fault

2016-01-22 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68692

--- Comment #9 from Sebastian Pop  ---
*** Bug 69292 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/68976] [6 Regression] ICE w/ -O2 (and above) -fgraphite-identity (or -floop-nest-optimize)

2016-01-22 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68976

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #16 from Sebastian Pop  ---
fixed in r232658.

[Bug tree-optimization/68756] [6 Regression] ICE w/ -O1 -floop-nest-optimize and isl 0.15: isl-0.15/isl_id.c:213: unable to find id

2015-12-16 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68756

Sebastian Pop  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |spop at gcc dot gnu.org

--- Comment #2 from Sebastian Pop  ---
Thanks for the nice reduced testcase.  I will have a look.

[Bug tree-optimization/68659] [6 regression] FAIL: gcc.dg/graphite/id-pr45230-1.c (internal compiler error)

2015-12-07 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68659

Sebastian Pop  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|DUPLICATE   |---

--- Comment #10 from Sebastian Pop  ---
Thanks for reporting.
I will have a look at what happens on arm.

[Bug bootstrap/68667] [6 Regression] GCC trunk build fails compiling graphite-isl-ast-to-gimple.c

2015-12-04 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68667

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Sebastian Pop  ---
fixed in r231223

[Bug tree-optimization/68692] [6 Regression] ice: Segmentation fault

2015-12-04 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68692

Sebastian Pop  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |spop at gcc dot gnu.org

--- Comment #3 from Sebastian Pop  ---
I'm looking at it.

[Bug tree-optimization/68693] [6 Regression] ice: in harmful_stmt_in_region, at graphite-scop-detection.c:1052

2015-12-04 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68693

--- Comment #3 from Sebastian Pop  ---
Author: spop
Date: Fri Dec  4 21:36:55 2015
New Revision: 231309

URL: https://gcc.gnu.org/viewcvs?rev=231309=gcc=rev
Log:
fix PR68693: Check for loop structure when extending the SCoP

The check for dominance while extending the scop assumed that
multiple successors meant a loop which is not true in case of
conditionals around the loop.

Improved pretty printers for better debugging.

PR tree-optimization/68693
* graphite-scop-detection.c (dot_all_sese): New
(dot_all_scops_1): Renamed to dot_all_sese.
(dot_all_scops): Removed.
(dot_sese): New.
(dot_cfg): New.
(scop_detection::get_nearest_dom_with_single_entry): Check that preds
are from different loop levels.
(scop_detection::get_nearest_pdom_with_single_exit): Check that succs
are from different loop levels.
(scop_detection::print_sese): Inlined.
(scop_detection::print_edge): New.
(scop_detection::merge_sese): Added dumps.
* graphite.h: Add declarations.

gcc/testsuite/ChangeLog:

* gfortran.dg/graphite/pr68693.f90: New test.

Added:
trunk/gcc/testsuite/gfortran.dg/graphite/pr68693.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/graphite-scop-detection.c
trunk/gcc/graphite.h
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/68693] [6 Regression] ice: in harmful_stmt_in_region, at graphite-scop-detection.c:1052

2015-12-04 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68693

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Sebastian Pop  ---
fixed

[Bug tree-optimization/68550] [6 Regression] ICE: verify_gimple failed Error: missing PHI def

2015-12-03 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68550

Sebastian Pop  changed:

   What|Removed |Added

 CC||sch...@linux-m68k.org

--- Comment #6 from Sebastian Pop  ---
*** Bug 68659 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/68659] [6 regression] FAIL: gcc.dg/graphite/id-pr45230-1.c (internal compiler error)

2015-12-03 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68659

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #8 from Sebastian Pop  ---
Most likely fixed in r231206.

*** This bug has been marked as a duplicate of bug 68550 ***

[Bug tree-optimization/68659] [6 regression] FAIL: gcc.dg/graphite/id-pr45230-1.c (internal compiler error)

2015-12-03 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68659

--- Comment #6 from Sebastian Pop  ---
I do not see the error on today's trunk at r231233.  Could you please verify
that this has been fixed by our changes from yesterday?

Thanks!

[Bug tree-optimization/68550] [6 Regression] ICE: verify_gimple failed Error: missing PHI def

2015-12-02 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68550

Sebastian Pop  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |spop at gcc dot gnu.org

--- Comment #3 from Sebastian Pop  ---
Looking at it.  Thanks for the nice reduced testcases.

[Bug tree-optimization/68550] [6 Regression] ICE: verify_gimple failed Error: missing PHI def

2015-12-02 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68550

--- Comment #4 from Sebastian Pop  ---
Author: spop
Date: Wed Dec  2 20:40:17 2015
New Revision: 231206

URL: https://gcc.gnu.org/viewcvs?rev=231206=gcc=rev
Log:
fix PR68550: do not handle ISL loop peeled statements

In case ISL did some loop peeling, like this:

  S_8(0);
  for (int c1 = 1; c1 <= 5; c1 += 1) {
S_8(c1);
  }
  S_8(6);

we should not copy loop-phi nodes in S_8(0) or in S_8(6).

PR tree-optimization/68550
* graphite-isl-ast-to-gimple.c (copy_loop_phi_nodes): Add dump.
(copy_bb_and_scalar_dependences): Do not code generate loop peeled
statements.

* gfortran.dg/graphite/pr68550-1.f90: New.
* gfortran.dg/graphite/pr68550-2.f90: New.

Added:
trunk/gcc/testsuite/gfortran.dg/graphite/pr68550-1.f90
trunk/gcc/testsuite/gfortran.dg/graphite/pr68550-2.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/graphite-isl-ast-to-gimple.c
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/68550] [6 Regression] ICE: verify_gimple failed Error: missing PHI def

2015-12-02 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68550

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Sebastian Pop  ---
fixed

[Bug middle-end/68565] [6 Regression] graphite : -O2 -floop-nest-optimize miscompile

2015-11-30 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68565

--- Comment #2 from Sebastian Pop  ---
Author: spop
Date: Mon Nov 30 20:39:16 2015
New Revision: 231086

URL: https://gcc.gnu.org/viewcvs?rev=231086=gcc=rev
Log:
check for ISL generated code that leads to division by zero

we used to generate modulo and division by zero because ISL uses big numbers
which translate to zero in modulo arithmetic.  The patch also improves error
handling
and bails out early in case of wrong code gen.

PR tree-optimization/68565
* graphite-isl-ast-to-gimple.c (binary_op_to_tree): Early return on
codegen_error.  Fail when rhs of division operations is integer_zerop.
(ternary_op_to_tree): Early return on codegen_error.
(unary_op_to_tree): Same.
(nary_op_to_tree): Same.
(gcc_expression_from_isl_expr_op): Same.
(gcc_expression_from_isl_expression): Same.
(graphite_create_new_loop): On codegen_error continue generating
wrong code.
(graphite_create_new_loop_guard): Same.
(build_iv_mapping): Same.
(graphite_create_new_guard): Same.

* gfortran.dg/graphite/pr68565.f90: New.

Added:
trunk/gcc/testsuite/gfortran.dg/graphite/pr68565.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/graphite-isl-ast-to-gimple.c
trunk/gcc/testsuite/ChangeLog

[Bug middle-end/68565] [6 Regression] graphite : -O2 -floop-nest-optimize miscompile

2015-11-30 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68565

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Sebastian Pop  ---
Fixed.  Thanks for the testcase!

[Bug tree-optimization/68453] [6 Regression] graphite ICE: segfault

2015-11-26 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68453

Sebastian Pop  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Sebastian Pop  ---
fixed in r230918

[Bug tree-optimization/67984] [GRAPHITE] internal compiler error: isl_ctx freed, but some objects still reference it

2015-11-24 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67984

Sebastian Pop  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
  Known to work||6.0
 Resolution|--- |FIXED
  Known to fail||5.2.1

--- Comment #4 from Sebastian Pop  ---
Fixed in trunk gcc 6.0 at r230826.

[Bug tree-optimization/68493] [6 Regression] [graphite] ICE in copy_loop_phi_args

2015-11-23 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68493

Sebastian Pop  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Sebastian Pop  ---
fixed in r230772.

[Bug middle-end/68279] ICE: in create_pw_aff_from_tree, at graphite-sese-to-poly.c:836

2015-11-23 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68279

Sebastian Pop  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Sebastian Pop  ---
Fixed in r230771

[Bug middle-end/68314] [6 Regression] Invalid read in build_pbb_minimal_scattering_polyhedrons (graphite-sese-to-poly.c:148)

2015-11-23 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68314

--- Comment #2 from Sebastian Pop  ---
This patch exposes the problem without valgrind:

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 2054fad..b932dae 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -143,6 +143,9 @@ build_pbb_minimal_scattering_polyhedrons (isl_aff
*static_sched, poly_bb_p pbb,
  /* False for loop dimension.  */
  sequence_and_loop_dims[i + j] = false;
}
+
+  gcc_assert (nb_sequence_dim > j);
+
   /* Fake loops make things shifted by one.  */
   if (sequence_dims && sequence_dims[j] == i)
sequence_and_loop_dims[i + j] = true;

[Bug tree-optimization/67984] [GRAPHITE] internal compiler error: isl_ctx freed, but some objects still reference it

2015-11-23 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67984

Sebastian Pop  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2015-11-23
 CC||spop at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Sebastian Pop  ---
I cannot reproduce the error on GCC 6.0 trunk.
Also, please provide a reduced testcase, the attached testcase fails with:

In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/immintrin.h:43:0,
 from /usr/include/CL/cl_platform.h:441,
 from /usr/include/CL/cl.h:30,
 from /usr/include/CL/opencl.h:42,
 from dcttest.c:61:
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx2intrin.h: In function
‘_mm256_mpsadbw_epu8’:
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx2intrin.h:46:12: error: can’t
convert a value of type ‘int’ to vector type ‘__vector(4) long long int’ which
has different size

[Bug middle-end/68314] [6 Regression] Invalid read in build_pbb_minimal_scattering_polyhedrons (graphite-sese-to-poly.c:148)

2015-11-23 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68314

Sebastian Pop  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Sebastian Pop  ---
fixed in r230778

[Bug middle-end/68279] ICE: in create_pw_aff_from_tree, at graphite-sese-to-poly.c:836

2015-11-23 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68279

--- Comment #5 from Sebastian Pop  ---
After fixing the graphite fail, I get these warnings from the testcase in
comment4:

FAIL: gfortran.dg/graphite/pr68279.f90   -O  (test for excess errors)
Excess errors:
/work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:21:19: Warning:
Legacy Extension: REAL array index at (1)
/work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:22:25: Warning:
Legacy Extension: REAL array index at (1)
/work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:22:41: Warning:
Legacy Extension: REAL array index at (1)
/work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:22:29: Warning:
Legacy Extension: REAL array index at (1)
/work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:22:75: Warning:
Legacy Extension: REAL array index at (1)
/work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:22:86: Warning:
Legacy Extension: REAL array index at (1)
/work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:24:27: Warning:
Legacy Extension: REAL array index at (1)
/work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:24:36: Warning:
Legacy Extension: REAL array index at (1)
/work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:25:16: Warning:
Legacy Extension: REAL array index at (1)
/work/spop/gcc/gcc/testsuite/gfortran.dg/graphite/pr68279.f90:25:34: Warning:
Legacy Extension: REAL array index at (1)

Is there a flag I can set to avoid these warnings?
Thanks!

[Bug tree-optimization/68493] [6 Regression] [graphite] ICE in copy_loop_phi_args

2015-11-23 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68493

--- Comment #1 from Sebastian Pop  ---
Passes on ISL 0.14, fails with 0.15.
This patch fixes it: we will bootstrap and commit.

diff --git a/gcc/graphite-isl-ast-to-gimple.c
b/gcc/graphite-isl-ast-to-gimple.c
index 30c3a21..2783ac4 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -2760,6 +2760,8 @@ translate_isl_ast_to_gimple::translate_pending_phi_nodes
()
  fprintf (dump_file, "[codegen] to new-phi: ");
  print_gimple_stmt (dump_file, new_phi, 0, 0);
}
+  if (codegen_error)
+   return;
 }
 }

[Bug middle-end/68314] [6 Regression] Invalid read in build_pbb_minimal_scattering_polyhedrons (graphite-sese-to-poly.c:148)

2015-11-23 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68314

Sebastian Pop  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |spop at gcc dot gnu.org

--- Comment #1 from Sebastian Pop  ---
Confirmed with ISL 0.15.
I'm looking at it.

[Bug tree-optimization/68453] [6 Regression] graphite ICE: segfault

2015-11-20 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68453

Sebastian Pop  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |spop at gcc dot gnu.org

--- Comment #2 from Sebastian Pop  ---
Confirmed.

[Bug tree-optimization/68335] [6 Regression][GRAPHITE] ICE: tree check: expected ssa_name, have real_cst in add_phi_arg_for_new_expr, at sese.c:1373

2015-11-19 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68335

--- Comment #4 from Sebastian Pop  ---
testcase added in r230630

[Bug tree-optimization/68428] [6 Regression] [graphite] ICE in outermost_loop_in_sese w/ -O2 -floop-strip-mine or -O2 -floop-nest-optimize

2015-11-19 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68428

Sebastian Pop  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||spop at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #1 from Sebastian Pop  ---
Fixed in r230632

[Bug tree-optimization/68335] [6 Regression][GRAPHITE] ICE: tree check: expected ssa_name, have real_cst in add_phi_arg_for_new_expr, at sese.c:1373

2015-11-19 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68335

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Sebastian Pop  ---
This is fixed in trunk as of today.  I will add the testcase.

[Bug tree-optimization/68341] [6 Regression] FAIL: gcc.dg/graphite/interchange-{1,11,13}.c (internal compiler error)

2015-11-19 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68341

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Sebastian Pop  ---
Fixed in r230631

[Bug tree-optimization/63602] [4.9/5 Regression] [graphite] Wrong code w/ -O2 -ftree-loop-nest-optimize

2015-11-18 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63602

Sebastian Pop  changed:

   What|Removed |Added

  Known to work||6.0
Summary|[4.9/5/6 Regression] Wrong  |[4.9/5 Regression]
   |code w/ -O2 |[graphite] Wrong code w/
   |-ftree-loop-nest-optimize   |-O2
   ||-ftree-loop-nest-optimize
  Known to fail|6.0 |

--- Comment #5 from Sebastian Pop  ---
gcc 6.0 trunk does not go out of SSA anymore: we rewrote graphite's code gen
and added all scalar dependences crossing basic blocks to the dependence graph.

[Bug tree-optimization/68398] New: coremark regression due to r229685

2015-11-17 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68398

Bug ID: 68398
   Summary: coremark regression due to r229685
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: spop at gcc dot gnu.org
  Target Milestone: ---

We have seen a performance regression due to r229685.
We see fewer FSM jump threads on the reduced testcase.

CC=2015-11-02-23-23-28-d3063db-trunk/bin/gcc
$CC -O3 m.c -fdump-tree-dom1-details=a -o a.out
CC=2015-11-02-23-25-06-f497d67-trunk/bin/gcc
$CC -O3 m.c -fdump-tree-dom1-details=b -o b.out

$ grep FSM a | wc -l
17

$ grep FSM b | wc -l
15

on x86_64 valgrind indicates that with the patch we have 2.5% more instructions
executed:

+ valgrind --dsymutil=yes --tool=callgrind --callgrind-out-file=a.call ./a.out
==27524== Callgrind, a call-graph generating cache profiler
==27524== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et al.
==27524== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright
info
==27524== Command: ./a.out
==27524== 
==27524== For interactive control, run 'callgrind_control -h'.
==27524== 
==27524== Events: Ir
==27524== Collected : 209839882
==27524== 
==27524== I   refs:  209,839,882
+ valgrind --dsymutil=yes --tool=callgrind --callgrind-out-file=b.call ./b.out
==27585== Callgrind, a call-graph generating cache profiler
==27585== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et al.
==27585== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright
info
==27585== Command: ./b.out
==27585== 
==27585== For interactive control, run 'callgrind_control -h'.
==27585== 
==27585== Events: Ir
==27585== Collected : 213154557
==27585== 
==27585== I   refs:  213,154,557
+ callgrind_annotate a.call

Profile data file 'a.call' (creator: callgrind-3.10.0.SVN)

I1 cache: 
D1 cache: 
LL cache: 
Timerange: Basic block 0 - 46055772
Trigger: Program termination
Profiled target:  ./a.out (PID 27524, part 1)
Events recorded:  Ir
Events shown: Ir
Event sort order: Ir
Thresholds:   99
Include dirs: 
User annotated:   
Auto-annotation:  off


 Ir 

209,839,882  PROGRAM TOTALS


 Ir  file:function

138,250,035  ???:core_bench_list [a.out]
 69,160,889  ???:core_list_mergesort.constprop.2 [a.out]
  2,309,860  ???:core_list_init [a.out]

+ callgrind_annotate b.call

Profile data file 'b.call' (creator: callgrind-3.10.0.SVN)

I1 cache: 
D1 cache: 
LL cache: 
Timerange: Basic block 0 - 48409229
Trigger: Program termination
Profiled target:  ./b.out (PID 27585, part 1)
Events recorded:  Ir
Events shown: Ir
Event sort order: Ir
Thresholds:   99
Include dirs: 
User annotated:   
Auto-annotation:  off


 Ir 

213,154,557  PROGRAM TOTALS


 Ir  file:function

138,845,638  ???:core_bench_list [b.out]
 71,879,961  ???:core_list_mergesort.constprop.2 [b.out]
  2,309,860  ???:core_list_init [b.out]

$ cat m.c
typedef struct list_data_s {
  short data16;
  short idx;
} list_data;

typedef struct list_head_s {
  struct list_head_s *next;
  struct list_data_s *info;
} list_head;

list_head *core_list_find(list_head *list,list_data *info);
list_head *core_list_reverse(list_head *list);
list_head *core_list_remove(list_head *item);
list_head *core_list_undo_remove(list_head *item_removed, list_head
*item_modified);
list_head *core_list_insert_new(list_head *insert_point
, list_data *info, list_head **memblock,
list_data **datablock
, list_head *memblock_end, list_data
*datablock_end);
typedef int(*list_cmp)(list_data *a, list_data *b);
list_head *core_list_mergesort(list_head *list, list_cmp cmp);

short state_scores[4] = {-29126, 24894, -24736, -272};

short matrix_scores[4] = {8151, -30381, -32453, 11169};

unsigned state_idx = 0, matrix_idx = 0;

short calc_func(short *pdata

[Bug tree-optimization/68343] FAIL: gcc.dg/graphite/fuse-{1,2}.c scan-tree-dumps

2015-11-13 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68343

Sebastian Pop  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |spop at gcc dot gnu.org

--- Comment #5 from Sebastian Pop  ---
You need ISL 0.15 to have these tests pass.
Could you please report which ISL version you configured gcc with?
I will try to get a check in the graphite.exp to only select fuse-* files when
configured with ISL 0.15 or later.

[Bug middle-end/68279] ICE: in create_pw_aff_from_tree, at graphite-sese-to-poly.c:836

2015-11-10 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68279

Sebastian Pop  changed:

   What|Removed |Added

 CC||spop at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |spop at gcc dot gnu.org

--- Comment #2 from Sebastian Pop  ---
I'll have a look.

[Bug tree-optimization/66070] [GRAPHITE] cc1 gets killed by OOM killer

2015-10-12 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66070

--- Comment #4 from Sebastian Pop  ---
r227572


[Bug tree-optimization/62113] [graphite] ICE using -floop-parallelize-all

2015-10-09 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62113

Sebastian Pop  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||spop at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #3 from Sebastian Pop  ---
Fixed on trunk with a recent ISL-0.15 that contains the compute time out
functions.

$ time gcc -O2 -floop-parallelize-all -c rdft.i
real 0m1.763s


[Bug middle-end/47598] -fgraphite-identity at -O2 breaks profiledbootstrap

2015-10-09 Thread spop at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47598

Sebastian Pop  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Sebastian Pop  ---
Just completed a "make profiledbootstrap" on trunk with BOOT_CFLAGS="-g -O2
-fgraphite-identity -floop-nest-optimize" on an x86_64-linux machine.


  1   2   3   4   5   6   >